AD-A233  030 


•NTATION  PAGE 


Form  Approve 
OMB  No.  0704-01  St 


him  to  tv*'*}*  '  "ouf  'noontt.  inuumn*  tnt  tim#  tsf  f«vi«winq  'Mtiuctiom.  uwcntnq  tiiutna  «u»u!C^ 
rtvtwmq  tr« collection  ot  mtormition  t»na  (emmtnn t Mitaimj  tmt  ouratfl  ntimitt  or  my  otnor  tiooct  erf  •?>. 
ourarf.  (O  vVivfinofS"  H«»aau»n»n  1«rvicf».  Oir«cor«t»  for  informotion  Oo*r»tion» mo  Amoto.  I jn  :««♦.*« 
OffKtOf  mo  Suaqtt.  fsoorwort  «Mi<tiOft  *fOl*rt  (070*^3  mi.WMfiir'qr  Oft.  OC  iOJQJ. 


AGENCY  US*  ONLY  (LtiVO  0 link) 


2.  REPORT  OAT* 

December  01, 1990 


3.  REPORT  TYPE  AND  OATES  COVERED 

Final  Report  for  Period  01  Nov.  88-31  Oct.  90 


S.  FUNDING  NUMBERS 


a  TiTLf  *MO  ?U»TlTtI  I  s-  ^UNUING  ."IVJIVUtK 

^Hrerarcnical  Neural  Network  (HNN)  For  Closed  Loop  Decision  Making."!  Grant  #  890010 
Subtitle:  "Designing  the  Architecture  of  a  Hierarchical  Neural  Network 
to  Model  Attention,  Learning  and  Goal  Oriented  Behavior." 


i.  AUTHOR(S) 

Allon  Guez 


7.  PERFORMING  ORGANIZATION  NAME(S)  ANO  AOORESS(ES) 


ECE  Department, 
Drexel  University, 
Philadelphia,  PA  19104 


■  8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER  _  a 

AEQSR-TR-  91  0136 


10.  SPONSORING/ MONITORING 
AGENCY  REPORT  NUMBER 


12*.  DISTRIBUTION  /AVAILABILITY  STATEMENT 

m 

•  isll 

Approved  for  public  release  * 

i  til  -  ‘ '  -•+ 5  o->. 

J _ _ _ f: _ 

13.  ABSTRACT  (Maximum  200  words)  / 

^  The  objectives  of  the  project;were  to  design  and  evaluate  a  hierarchical  neural  network  (HNN) 
capable  of  real  time  learning  and  decision  making  in  closed  loop. 

In  the  initial  stages  of  the  project  the  problem  was  defined  and  the  relating  state  of  the  art  methods 
were  surveyed.  Later  control  of. a  robodic  system  was  used  as  the  prototypical  task  and  a  HNN  was 
designed  and  compared  with  the  state  of  the  art  adaptive  control  techniques. 

During  this  project  the ioncept  of  exploratory  schedules  (ES)  was  developed.  ES  is  defined  as 
system  trajectories  intemally;generated  by  the  HNN  for  the  purpose  of  efficient  learning.  This  concept 
was  implemented  in  an  open-loop  fashion  for  the  control  of  robotic  manipulators.  A  theorem  was  proved 
that  gives  constructive  conditions  for  stable  leafing  in  closed  loop.  This  technique  yielded  improved 
transients  in  tracking  desired  trajectories  in  comparison  with  adaptive  control  methods.  HNN  architecture 
was  applied  as  a  controller  for  a  class  of  nonlinear  systems  linear  in  control.  It  was  shown  to  have 
guaranteed  asymptotic  stability.  HNN  architecture  was  employed  with  partial  success  in  areas  of  pattern 
recognition  and  control. 


14.  SUBJECT  TERMS 

IS.  NUMBER  OF  PAGES 

10 

Learning,  Neural  Networks  (NN),  Control,  Exploratory  Schedules 

IB.  PRICE  CODE 

17."  SECURITY  CLASSIFICATION  18.  TiCURlTYnOASSIFICATION  19.  SECURITY  CLASSIFICATION  20.  LIMITATION  OF  ABSTRACT 
OF  REFORT  OF  THIS  PAGE  OF  ABSTRACT 

Unclassified  Unclassified  Unclassified  UL 


NSN  7540-01 -280-5500 


St*nd».'d  Form  298  (R«v.  2-89) 
t*  ANSI  SM* 


DISCLAIM  NOTICE 


THIS  DOCUMENT  IS  BEST 

QUALITY  AVAILABLE.  THE  COPY 

FURNISHED  TO  DTIC  CONTAINED 

A  SIGNIFICANT  NUMBER  OF 

PAGES  WHICH  DO  NOT 
PEPunniTfE  r Emm  v 

J LlUi,  XWJL^  V  VJU  .  i.  I  HJr»  /  1  • 


Report  2 


HIERARCHICAL  NEURAL  NETWORK  (HNN)  FOR  CLOSED  LOOP  DECISION 
MAKING 


Designing  the  Architecture  of  a  Hierarchical  Neural  Network  to  Model 
Attention,  Learning  and  Goal  Oriented  Behavior 


Allon  Guez 

ECE  Department 
Drexel  University 
Philadelphia,  PA  19104 


L.. 

C.  .  '  .  < 

ki-.u.rO 

O'**  t 


)C 


**-  --gcr 


01  December  1990 


Final  Report  for  Period  01  November  88-31  October  90 


Prepared  for 
AFOSR/NE 


91  3 


07  07T 


Hierarchical  Neural  Network  (HNN)  For  Closed  Loop  Decision  Making 

Allon  Guez 

ECE  Department 
Drexel  University 
Philadelphia,  PA  19104 


Contents 

Page  # 

1  Accomplishments  and  Summary 

2  References 

3  Appendix:  Publications  partially  supported 

by  AFOSR  Grant  #  890010 


1 


Cv)  CO  oo 


This  is  the  final  report  for  the  AFOSR  grant  #890010  which  was 
granted  for  a  period  of  two  years  starting  on  November  1988  and 
expiring  on  November  1990.  It  contributed  to  the  publication  of  26  ! 
articles  of  which  7  are  archival  and  referred  journals.  For  further 
details  please  refer  to  the  annual  report  [AR1]  and  the  attached 
publications. 


Accomplishments 

The  main  accomplishments  of  the  project  are  summarized  as 
follows: 

Presenting  and  demonstrating  the  new  concept  of  exploratory 
schedules.  ([1], [5], [7-12], [17], [22], [27-29]). 

Demostrating  that  even  the  preliminary  version  of  HNN  has 
benefits  not  available  to  presently  employed  conventional 
methods.  ([l],[2],[5-7]). 

A  new  theorem  states  constructive  conditions  for  stable 
learning  in  closed  loop.  ([5], [7]). 

Guaranteed  stability  in  closed  loop  of  HNN  for  a  class  of 
nonlinear  systems  linear  in  control  [35-36], 

Application  of  the  approach  to  several  decision  making 
problems:  Robotics,  Pattern  Recognition,  and  Control.  ([3 -4],  [7- 
9], [13-16], [18-21], [23-26], [30-34]) 

Summary 

The  objectives  of  the  project  were  to  design  and  evaluate  a 
hierarchical  neural  network  (HNN)  capable  of  real  time  learning  and 
decision  making  in  closed  loop. 

In  the  initial  stages  of  the  project  the  problem  was  defined  and 
the  relating  state  of  the  art  methods  were  surveyed.  Later  control  of 
a  robotiic  system  was  used  as  the  prototypical  task  and  a  HNN  was 
designed  and  compared  with  the  state  of  the  art  adaptive  control 
techniques. 


2 


During  this  project  the  concept  of  exploratory  schedules  (ES) 
was  developed  [5], [7-12], [17], [22], [27-29].  ES  is  defined  as  system 
trajectories  internally  generated  by  the  HNN  for  the  purpose  of 
efficient  learning.  This  concept  was  implemented  in  an  open-loop 
fashion  for  the  control  of  robotic  manipulators.  A  theorem  was 
proved  that  gives  constructive  conditions  for  stable  learing  in  closed 
loop  [1-2], [5-7].  This  technique  yielded  improved  transients  in 

tracking  desired  trajectories  in  comparison  with  adaptive  control 
methods.  HNN  architecture  was  applied  as  a  controller  for  a  class  of 
nonlinear  systems  linear  in  control.  It  was  shown  to  have 
guaranteed  asymptotic  stability  [35,36].  HNN  architecture  was 
employed  with  partial  success  in  areas  of  pattern  recognition  and 
control  [3, 4, 13, 14, 18-21, 23-26], [30-34]. 
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Current  Perspectives  on  Neural  Networks 


The  Application  of  Neural  Networks  to  Robotics 


Ziauddin  Ahmad  John  Selintky  ABenGue* 

Drexel  University,  Department  of  Electrical  and  Composer  Engineering 
Commonwealth  Building,  Philadelphia,  PA  1910# 


We  report  on  the  application  of  Neural  Networks  (NN)  architecture  to  two  important  problems  in  robotics.  A 
hybrid  architecture  is  used  to  solve  the  inverse  kinematics  problem  with  performance  superior  to  the  that.of  the 
classical  approach.  We  also  utilize  a  hierarchical  NN  architecture  for  learning  rigid  robot  dynamics  far  the  purpose  of 
real  time  control. 

List  of  Terms 

NN :  Neural  Networks),  IKP :  Inverse  Kinematic  Problem ,  LMS  :  Least  Mean  Square,  FF :  FeedForward,  DOF 
:  Degree(s)  Of  Freedom, 


1.  Introduction 

In  this  chapter  we  summarize  the  results  of  [Guez  89b]  and  [Guez  90]  in  applying  NN  architecture  to  problems 
in  robot  control.  Section  2  describes  a  hybrid  architecture  used  to  solve  the  inverse  kinematics  problem  with 
performance  superior  to  the  that  of  the  classical  approach.  In  Section  3  we  utilize  a  hierarchical  NN  architecture  for 
learning  rigid  robot  dynamics  for  the  purpose  of  real  time  control. 

2.  The  Inverse  Kinematic  Problem  in  Robotics 

The  purpose  of  a  robot  is  to  manipulate  objects  with  its  end-effector.  The  end-effector  is  at  one  end  of  a  chain  of 
links,  connected  with  joints,  the  other  end  is  fixed.  The  motion  of  the  robot  is  linked  with  the  transformations 
between  the  joints  and  the  end-effector.  The  actuation  of  the  joints  result  in  a  motion  of  the  end-effector.  In  order 
for  the  end-effector,  carrying  an  object,  to  follow  a  desired  path  in  the  world  space  it  is  required  to  actuate  the  joints 
in  a  manner  that  will  make  the  end-effector  to  track  the  desired  trajectory. 

Solution  to  the  inverse  kinematics  exist  in  closed  form  for  some  robots  or  may  be  obtained  by  iterative 
methods.  However,  the  procedure  to  obtain  a  closed  form  soluton,  if  it  exists,  is  tedious  and  the  iterative  methods 
have  a  tendency  to  diverge  when  the  initial  guess  to  the  is  far  from  the  actual  solution. 
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The  inverse  kinematic  problem  (KP)  deals  with  finding  the  'n'  joint  angle  values  'q'  of  the  robot  that  will* 
position  the  end-effec^r  in  a  desired  position  and  orientation  'X'  ia  the 'm*  dimensional  world  space.  This  may  be  ? 
expressed  as:  •<  * 

(2.1) 

However,  in  general  this  solution  is  not  unique.  In  many  cases  (e.g.  redundant  manipulators)  there  may  result 
infinite  number  of  solutions.  In  these  cases  additional  constraints  [Yoshikawa]  in  terms  of  the  allowed 
configurations  or  performance  function  minimization  are  used  to  reduce  the  number  of  legitimate  joint  condonations 
or  to  single  out  a  unique  preferable  one. 


2.2  The  proposed  method 

The  formation  of  the  proposed  method  starts  with  the  robotic  manipulator  for  which  the  KP  is  to  be  solved  and 
a  'blank'  (untrained)  multilayered  feedforward  neural  network  of  suitable  size.  Then,  training  data  in  the  form  of  pain 
of  the  end-effector  position  and  orientation  X  *[  +  0  y  x  y  z]T,  and  the  corresponding  joint  values  q*[qi  q2 
. . .  qn)T,  are  generated.  These  data  are  used  for  training  the  NN  via  the  back  error  propagation  algorithm 
[Rumelhart].  The  position  and  orientation  vector  is  the  input  and  the  corresponding  joint  values  are  the  desired 
outputs  of  the  network.  After  the  training  is  completed  the  trained  NN  is  coupled  with  the  iterative  method,  as 
shown  in  the  Fig.  1,  for  the  purpose  of  operation.  During  the  operation  phase  the  desired  position  and  orientation  X 
of  the  end-effector  is  provided  to  the  NN.  The  NN  gives  the  approximate  solution  q0  based  on  the  learned 
connection  weights.  This  approximate  solution  is  taken  as  the  initial  guess  by  the  iterative  method  to  give  the  final 
solution  within  the  specified  tolerance.  We  consider  that  the  type  of  solution  out  of  the  finitely  many  solutions  is 
pre-specified  to  us  and  therefore  training  of  the  NN  is  restricted  to  the  set  of  examples  that  pertain  to  this  specific 
solution  only. 


(a)  (b) 

Fig.  1  Schematics  of  the  proposed  method  employing  the  NN  to  provide 
the  initial  guess  «q  to  the  iterative  procedure. 


2.3  Examples 

The  back  error  propagation  (BEP)  algorithm  simulating  a  three  layer  perception  was  employed  to  tackle  the 
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problems  described  below.  Continuous  inputs  and  outputs  were  assumed.  The  nodes  assumed  the  symmetric 
sigmoidal  nonlinearity  [Scott].  Parameters  of  the  NN  are  as  given  in  reference  [Guez-90].  Training  was  terminated 
when  it  was  seen  that  the  errors  were  not  improving. 

Here  we  show  an  attempt  to  capture  the  criteria  that  a  human  being  allegedly  optimizes  in  manipulating  different 
objects  by  training  the  NN  by  a  data  set  corresponding  to  some  specified  task.  Planar  motion,  parallel  to  the  ground 
was  the  considered  task.  The  subject  was  asked  to  move  ark  object  in  free  space,  in  a  plane  parallel  to  the  ground. 
Knowing  the  actual  distances  between  the  joints  the  data  set  was  filtered  to  achieve  a  10.0%  tolerance  about  the 
respective  actual  values.  The  data  set  thus  obtained  contained  only  43  words  out  of  a  total  of  78  words.  The 
network  for  this  case  constituted  of  2  inputs  and  3  output  nodes  and  two  hidden  layers,  each  containing  10  nodes. 

Fig.  4(a)  shows  the  plot  of  the  error  in  the  positioning  of  the  hand  resulting  for  the  trained  data  set,  while  Fig. 
4(b)  shows  the  same  for  a  different  data  set  obtained  separately  from  the  data  set  on  which  the  network  was  trained. 
The  two  figures  have  similar  errors  indicating  that  the  neural  network  has  generalized  on  the  trained  data  set  Large 
errors  near  X  =  0.5  m  are  perhaps  due  to  the  singularity  reasons  or  insufficient  data  near  that  region.  Further,  it  was 
observed  that  the  values  for  the  elbow  joint's  were  learned  much  better  than  those  for  the  wrist  joint  and  the  shoulder 
joint. 

As  seen  earlier,  the  filtered  data  set  was  only  55%  of  the  total  data  gathered  with  10%  tolerance  which  indicates 
that  the  precision  of  the  training  data  may  not  be  adequate.  However,  it  is  observed  that  the  NN  is  able  to  generalize 
upoo  the  training  data  giving  similar  results  for  the  untrained  data  to  that  of  the  trained  data  implying  that  implicit 
performance  indices  can  be  captured  via  NNs  and  perhaps  identified  via  weight  pruning  and  analysis. 


(a)  Hand  positioning  error  for  the  training 
data. 


(b)  Hand  positioning  error  for  other  than  the 
training  data. 


Fig.4:  Results  of  single  MFN  trained  on  human  arm  data  in  a  plane 
perpendicular  to  the  gravity.  Shoulder  is  located  at  (0,0). 
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2.3-2  PUMA56Q 

The  PUMA  560  parameters  were  taken  from  [Fu],  page  37.  This  manipulator,  which  is  a  6- DOF  robot,  was 
chosen  for  ease  of  generation  of  data  and  verification  of  results  since  it  has  a  dosed  form  solution.  PUMA  560  has  ^ 
eight  solutions  for  a  gffren  position  and  orientation  signified  by  Right/Left  -  Shoulder,  Above/Below  -  Elbow,  and  + 
Up/Down  -  Wrist.  In  Our  simulations  the  training  data  corresponds  to:  LEFT  Arm,  BELOW  Elbow  and  UP  Wrist 
configuration.  In  the  simulations  the  joint  limits  used  for  the  6th  joint  were  -180^  to  130°  instead  of  -266°  to 
266°. 

The  network  in  this  case  consisted  of  6  input  nodes,  one  output  node  each  for  the  6  joints  and  two  hidden  layers 
for  each  joint  consisting  of  32  nodes  in  the  first  layer  and  8  nodes  in  the  second  layer.  The  average  error  in  the 
solution  given  by  the  NN  for  each  joint  taken  over  100  samples  ranged  from  0.16°  to  6.02°  while  the  standard 
deviation  of  the  error  ranged  from  1 1.5°  to  36.8°.  This  solution  was  seen  to  be  more  scattered  for  Joints  4  to  6  as 
compared  with  the  joints  1  to  3. 

Next,  the  proposed  method  was  compared  by  giving  a  fixed  estimate  to  the  iterative  procedure.  This  Fixed 
Estimate  was  taken  as:  0j  =  0,  02  =  0,  0j  =  jc/4,  04  =  0,  @5  *  x/4,  and  0g  =  0,  which  is  a  configuration 
corresponding  to  which  the  NN  was  trained,  as  indicated  in  the  beginning  of  this  section.  In  the  simulations  the 
equations  were  solved  by  Gauss  Elimination  method  and  partial  pivoting.  The  maximum  number  of  iterations 
allowed  for  the  iterative  method  were  100.  The  iterative  method  was  successfully  terminated  when  the  norm  of  the 
difference  between  the  desired  and  actual  end-effector  position  and  orientation  was  less  than  1.0E-4.  The  average  and 
standard  deviation  for  the  number  of  iterations  for  the  proposed  method  and  the  Fixed  Estimator,  in  a  run  of  100  data 
points  is  given  in  Table  1.  From  this  table  we  can  see  that  the  proposed  method  achieves  more  than  a  two-fold 
efficiency  in  computing  on  the  average  with  better  consistency.  Moreover,  it  was  observed  that  the  time  taken  by 
the  NN  equals  two  time  units  of  the  iterative  procedure,  which  amounts  to  less  than  10%  of  the  time  required  to  get 
the  solution  by  the  Fixed  Estimator  method. 

Table.  1 

Comparison  of  the  proposed  solution  with  the  Fixed 
initial  guess  Newton  Raphson  Method,  for  PUMA- 
560  manipulator.  The  data  corresponds  to  100 
samples. 


The  proposed  hybrid  method  which  takes  the  solution  given  by  the  trained  NN  as  an  initial  guess  to  an  iterative 
procedure  (Newton-Raphson  in  our  case;  combines  the  advantages  of  the  NN  and  iterative  methods,  these  being  (1) 
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independent  of  the  type  of  the  manipulator,  (2)  simple  to  implement  Only  forward  kinematics  is  required  for  this 
method  and  as  shown  by  our  simulations  this  combination  results  in  »  inctease  in  computational  efficiency  by  2- 
fold  for  the  PUMA  56f  (6- DOF)  robot  This  results  in  minimal  processing  within  each  control  cycle  and  improves# 


real  time  control  performance. 


I 


3.  Learning  of  Robot  Dynamics  Using  Hierarchical  Neural  Network 


Adaptation,  in  control,  is  the  process  of  adjusting  the  controller  to  comply  with  the  regulation  and  tracking 
requirements  of  the  closed  loop  system.  During  operation,  the  controller  is  given  a  trajectory  by  a  path  planner  in 
order  to  accomplish  some  useful  task.  The  controller  then  adapts  the  parameters,  on  line,  so  as  to  satisfy  the 
tracking  requirements.  If  the  parameters  are  not  known  exactly,  there  will  be  a  transient  period  of  tracking  error 
while  adaptation  occurs.  So  that  identification  of  the  true  pvameters  is  desirable  for  increased  tracking  precision. 

Inverse  dynamics  based  control  algorithms  are  computationally  intensive,  and  may  result  in  prohibitively  slow 
control  rates  if  implemented  on  serial  computers.  A  parallel  implementation  of  the  above  adaptive  controller  is 
proposed.  The  proposed  implementation  utilizes  a  luerarchical  neural  network  architecture,  which  would  ideally 
provide  very  fast  control. 


3.1  Dynamic  Model 

A  rigid  robot  is  defined  as  an  open  kinematic  chain  of  rigid  links,  which  me  joined  by  linear  or  revolute  joints. 
The  dynamic  model  of  a  rigid  robot  manipulator  can  be  written  as 

T(t)  =  I(q,t)  q©  +  H(q,q,t)q  +  Bq(t)  +  G©©  (3.1) 


where 

q(t)  is  the  n  x  1  vector  of  joint  linear  or  angular  positions, 

I(q,t)  is  the  n  x  n  matrix  of  terms  related  to  inertial  forces, 

H(q,q,t)  is  the  n  x  1  vector  of  terms  related  to  centripetal  and  coriolis  forces, 

B  is  the  n  x  n  diagonal  matrix  of  viscous  friction  terms, 

G(q,t)  is  the  n  x  1  vector  of  terms  related  to  gravitational  forces, 

T(t)  is  the  n  x  1  vector  of  driving  forces  or  torques 

and  n  is  the  number  of  degrees  of  freedom  (DOF)  of  the  manipulator,  (for  further  details  see  [Paul],[Fu]). 

The  representation  (equation  (2.2))  is  not  unique.  Different  choices  of  0  result  in  different  structures  of  the 
known  function  matrix  Y[q,q,q,q].  In  this  work,  it  will  be  assumed  that  the  parameter  vector  0  will  represent  the  p 
parameters  to  be  identified. 


The  manipulator's  dynamics  can  be  represented  by  a  weighted  linear  combination  of  suitable  basis  functions. 
The  basis  functions  for  a  general  class  of  manipulators  are  known  apriori  and  may  be  trained  into  a  senes  of  three 
layer  feedforward  network  modules  pnor  to  use  and  then  combined  online  in  a  fourth  layer  using  a  suitable  weight 
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adjustment  rule  such  as  LMS  (See  [Widrow]). 

We  use  a  Unear  control  law  with  added  nonlinear  compensation  to  control  the  manipulator.  During  an  initial 
purely  learning  phase  and  whenever  the  manipulator  is  not  needed  for  production,  Exploratory  Schedules  (ES)  are 
used  to  isolate  and  identify  the  nonlinear  compensation  terms.  ES  are  trajectories  specifically  designed  for  efficient 
learning  of  the  system  dynamics.  A  block  diagram  of  the  closed  loop  system  is  shown  in  figure  2. 


Fig.  2  Block  diagram  of  the  closed  loop  system 


Learning  of  the  nonlinear  compensaUon  terms  is  as  follows.  Let  G0(q),  W0(q,q)  and  Io(q)  be  the  outputs  of  the 
G(q),  W(q,q)  and  I(q)  compensation  networks  respecuvely,  where  W(q,q)=  H(q,q,t)q  +  Bq  After  receiving  the  desired 
trajectory,  the  control  that  is  to  be  applied  to  the  robot  is  calculated  using  the  feedback  control  (FB)  from  the  linear 
controller  and  the  feedforward  control  (FF)  from  the  learned  nonlinear  compensation  terms. 

Learning  of  the  G(q)  terms  is  accomplished  during  posmoning  control  phases  of  the  trajectory.  A*  the  steady 
state  of  positioning  control,  q  =  q  =  0 .  Then  from  (3.1)  it  can  be  seen  that  T  =  G(q).  The  applied  torque  at  steady 
state  can  be  used  to  learn  the  G(q)  compensation  network. 

Learning  of  the  W(q,q)  compensaUon  terms  is  performed  during  constant  velocity  portions  of  the  trajectory.  To 
isolate  the  W(q,q)  terms,  notice  that  at  q  =  constant,  q  =  0  and  T  =  W(q,q)  +  G(q).  But  G(q)  has  already  been 
identified  and  is  available  as  G0(q),  so  that  we  may  isolate  W(q,q)  to  be  used  in  learning. 

The  inertia  related  terms  I(q)  may  be  conunuously  evaluated  using  the  aprion  known  relauonship  ( see  [Slotine]) 
dl(q,t)/ut  =  H(q,q)  +  H(q,q)T 

After  the  dynamic  equauons  of  the  manipulator  are  known,  a  feedback  lineanzauon  or  inverse  dynamics  based 
controller  (see  [Guez  82],  [Fu],  of  the  form 

T  =  Io(kp  e  +  kve  +  qd)  +  W0(q.q)  +  G0(q)  (3.2) 

where  e  =  qdtt)  -  q(,t),  and  qd(t)  is  the  posiuon  reference  signal  is  then  employed.  The  closed  loop  system  is  then 
equivalent  to 


q(t)=  kpe  +  kye  +  q^t) . 


(3.3) 


6 


Current  Perspectives  on  Neural  Networks 


Which  when  kp  and  kv  are  appropriately  chosen  results  in  an  asymptotically  stable  system.  Note  that  this  process 
does  not  presuppose  the  existence  of  an  omnipotent  controller  that  is  capable  of  accurately  tracking  arbitrary 
trajectories.  But  rathal  it  starts  out  with  a  simple  controller  capable  of  performing  a  limited  task  and  uses  learned  ■* 
knowledge  of  the  manipulators  dynamics  to  build  a  controller  capable  of  controlling  the  manipulator  at  high  speeds 
over  the  entire  work  space. 

3.3  Simulation  Results 

The  learning  algorithm  was  tested  on  a  simulated  two  planar  DOF  manipulator  (see  figure  3).  Note  that  in  these 
results  exactly  computed  basis  functions  were  used. 


Fig.  3  Two  degree  of  freedom  planar  manipulator. 


The  dynamic  equations  for  this  manipulator  can  be  represented  as  linearly  separable  nonlinear  subsystems 


.  o 


T1  =  (all+a12c2)01  +  (a13+a14c2)02  +  a15s2)0102  +  a16s202+a1701  +  a18S2  +  a19S12 


t2  =  (a21  +  a22c2/01  +  a23  02  +  a24s2  0 1  +  a2502  +  a26s  12 


where  C,  =  cos(0t),  Sj  =  sin(0x),  C,j  =  cos(0,  +  0,),  Sy  =  sin(0t  +  0j),  and  the  a,j  terms  are  weighting  constants 
to  be  identified. 

Utilization  of  the  ES  for  a  2  DOF  manipulator  result  in  small  tracking  errors  may  lead  to  small  errors  in  the 
parameter  identification.  As  can  be  seen  in  table  2,  the  learning  algorithm  was  able  to  idenufy  the  a,,  terms  to  a 

‘J 

very  close  approximauon  of  their  true  values.  The  comparison  of  the  2  DOF  manipulator  controller  before  and  after 
learning  via  ES  showed  that  the  controller  performance  is  significantly  improved.  Further  details  of  the  method  and 
results  can  be  found  in  [Guez  89b]. 
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* 

it 


Table.  2 


Performance  results  of  the  learning  algorithm 


Weight 

True 

Estimated 

Number 

Value 

Value 

ail 

30.0 

29.51 

a  21 

10.0 

10.06 

a  12 

20.0 

19.28 

a  22 

10.0 

10.20 

ai3 

10.0 

9.79 

a  23 

10.0 

10.12 

. a  14 

10.0 

9.47 

a24 

10.0 

10.22 

ai5 

-20.0 

-19.28 

a25 

5.0 

5.01 

ai6 

-10.0 

-9.97 

. -426 

98.1 

98.10 

a  17 

5.0 

4.61 

ai8 

196.2 

196,19 

ai9 

98.1 

98.10 

3.4  Conclusion 

The  exploratory  schedules  have  been  specified  as  a  desired  trajectory  that  is  to  be  followed  to  do  learning  while 
the  manipulator  is  not  doing  other  useful  tasks.  The  simulation  results  of  the  ES  for  a  2  DOF  manipulator  showed 
how  small  tracking  errors  may  lead  to  small  errors  in  the  parameter  identification.  The  comparison  of  the  2  DOF 
manipulator  controller  before  and  after  learning  via  ES  showed  that  the  controller  performance  is  significantly 
improved. 
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NEUROCONTROLLERS 

Abstract 

Neural  network  architectures  possess  computational  features  that 
may  be  useful  in  the  construction  of  complex  oynamical  system 
controllers.  In  this  article  we  review,  define  and  classify  such 
controllers,  named  neurocontrollers.  The  various  neurocontroller 
classes,  namely  the  supervised,  unsupervised,  fixed  architecture 
and  neurocontrollers  with  a  critic  are  shown  to  differ  mainly  in 
the  extent  of  apriori  (teacher)  knowledge  which  is  mechanized  in 
the  neurocontroller  architecture  itself,  rather  than  be  made 
externally  available.  We  discuss  the  neurocontroller's  role  as 
nonlinear,  learning  •  id  adaptive  controllers. 

Kev  Words:  Neural  Network,  Supervised  learning,  Unsupervised 
Learning,  Neurocontrol 
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I.  Introduction 

A  neural  network  is  a  network  of  a  large  number  of  neuronlike 
subsystems  which  are  dynamically  coupled  and  exhibit  via  their 
collective  behavior  useful  computational  features.  These  networks 
were  extensively  studied  by  many  researchers  over  the  past  40 
years. 

Many  different  models  have  been  suggested  for  neural  networks. 
When  all  the  neuron  subsystems  update  their  state  simultaneously 
the  network  is  called  synchronous,  otherwise,  we  have  an 
asynchronous  network.  If  the  state  of  each  neuron  is  represented 
with  finite  resolution  we  denote  it  a  finite  state  neural  network; 
else  it  is  a  continuous  network. 

Neural  networks  may  also  be  classified  by  their  principal 
operation  phases.  The  "production"  phase  is  the  one  in  which  the 
time  evolution  of  the  network’s  state  manifests  the  useful 
computational  properties  sought  for.  For  instance,  when  a  neural 
network  is  used  in  an  "associative  memory"  (i.e.,  retrieval  of 
information  by  content),  then  the  convergence  of  the  network  state 
to  a  stable  attractor  is  the  useful  activity  which,  accordingly,  is 
called  the  "production"  phase.  The  learning/adaptation/design 
phase  of  a  neural  network,  is  the  stage  in  which  the  network 
"learns",  modifies,  or  designs  its  internal  architecture  as  a  result 
of  its  interaction  with  the  environment  (external  input)  and 
according  to  "metarules"  which  are  inherent  to  the  global  context 
within  which  the  network  is  to  be  useful.  Early  neural  networks 
models  separate  between  these  operational  phases.  First  the 
network  is  operated  in  a  learning  mode,  where  the  network  state  is 
not  allowed  to  change,  but  the  neuron  interconnections 
(architecture)  are  modified  (designed);  then  learning  ceases  and 


Neurocontrollers,  pg.4 


"production"  is  initiated  by  exposing  the  network  to  external 
excitation,  which  yields  "useful"  (converging)  state  trajectories. 

In  other  models,  the  two  phases  are  intermixed,  allowing  the 
network  architecture  and  state  to  concurrently  evolve,  i.e . ,  the 
network  is  simultaneously  adaptive  and  productive. 

Understanding  of  the  structure  and  functionality  of  the  Central 
Nervous  System  (CNS)  of  higher  animals  is  important  not  only  to 
neuroscientists,  but  also  to  the  practitioners  and  theorists  of  the 
reemerging  field  of  "BioMorphic  Engineering"  (BME).  The  latter 
includes  Neuroengineering  as  a  specific  field. 

These  disciplines  adopt  the  theory  of  evolution  as  their  underlying 
axiom.  It  is  believed  that  the  solution  to  many  complex  engineering 
problems  may  be  found  by  duplicating  the  hardware  ard  functional 
structure  of  their  (biological)  analogous  problem.  The  resulting 
solution  is  accepted  as  optimal  one  since  evolution  is  regarded, 
albeit  slow,  as  a  recursive  optimization  procedure  via  the 
selection  of  new  biological  hardware  and  functional  structure.  The 
literature  on  BME,  bionics,  and  cybernetics  is  very  rich,  we  shall 
only  mention  ([Albus  75],  (Arbib  85],  [Rosenblat  61],  [Tsypkin71] 
.[Grossberg  and  Kuperstein  86],  [Berkenblit  et  al  86])  as  their 
context  is  closest  to  our  focus:  neurocontrol. 

Even  if  evolution  is  not  accepted  as  an  optimization  process,  an 
engineer  may  "borrow"  biological  engineering  principles  by 
practicing  the  so  called  "learning  analogy"  paradigm,  which  is  now 
widely  accepted  in  the  Artificial  Intelligence  (Al)  community  as 
complex  engineering  issues  frequently  posses  an  observable 
biological  counterpart. 


Thus,  the  principle  of  adopting  algorithms,  hardware  architectures 
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and  functional  structures  observed  in  nature,  to  the  solution  of 
engineering  problems,  without  necessarily  possessing  an  apriori 
rigorous  understanding  of  their  roles,  is  becoming  an  acceptable 
practice  in  the  reemerging  field  of  BME. 

The  design  of  neurocontrollers  is  a  specific  application  of  BME,  and 
is  the  focus  of  this  paper.  Although  neurocontrollers  demonstrate 
adaptation  and  learning  capabilities,  they  are  distinct  from  both 
adaptive  and  learning  controllers.  In  the  next  section,  we  define 
the  class  of  neurocontrollers,  compare  them  to  adaptive  and 
learning  controllers  and  show  how  neurocontrollers  are  related  to 
adaptive  and  learning  controllers.  Then  in  section  III  we  review 
several  neurocontroller  architectures,  and  classify  them  according 
to  the  underlying  learning  algorithms  and  neural  net  architectures 
being  employed.  We  propose  and  demonstrate  a  generalizing 
principle  for  all  neurocontrollers  architectures.  This  generalizing 
principle  may  allow  for  the  natural  incorporation  of  apriori 
knowledge  in  both  the  neural  network  architecture  and  learning 
algorithm  selection  for  the  neurocontroller  design. 

II.  Adaptive,  Learning,  Nonlinear  and,  .Neuroconlrollers 

In  this  section,  we  def-ne  the  class  of  neurocontrollers,  compare  it 
to  the  adaptive  and  learning  controllers  and  demonstrate  their 
relation  to  adaptive  and  learning  controllers. 

The  problem  of  systems  control  can  be  defined  as  follows.  Given 
the  system  S  where 

S  :  x  =  f(x,u,t,p);  x(t0)  =  x0  y  =  g(x,u,t,p)  (2.1) 

where  the  time  t  £  R,  the  initial  time  is  t0,  the  state  X  e  Sx  e  Rn, 
the  control  input  u  e  Su  e  Rm,  the  parameters  vector  p  e  Sp  e  Rl,  the 
sets  Sx,  Su  and  Sp  are  admiss;oe  „:j:e,  control  and  parameter 
sets  respectively,  which  account  for  general  equality  and 
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inequality  constraints,  and  where  n,m  and  I  are  appropriate 
integers.  Find  the  control  input  u(t)«c[v(t),  z,z]  (open  loop  control) 
or  u(t)«c[v(t),  x.z.z]  (dosed  loop  control,  such  that  certain 
terminal  and  optimality  conditions  are  satisfied,  where  v(t)  is  a 
reference  input  vector  and  z(t)  is  a  vector  describing  the 
controller  state  variables. 

Under  the  above  general  definition,  it  is  easy  to  classify  the 
various  controller  classes  according  to  the  type  of  control 
function  c  which  they  are  able  to  provide.  If  c  is  independent  of  z 
and  z,  it  defines  the  class  of  static  controllers,  in  particular  if  c 
is  also  linear  in  x  and  v.  we  obtain  the  class  of  linear  state 
feedback  controllers.  When  c  is  a  function  of  z  and  z,  we  obtain  a 
dynamic  controller,  where  z(t)  describe  the  controller's  state 
variables.  If  z  is  a  function  of  all  or  some  of  the  system 
parameters,  p,  we  obtain  the  class  of  adaptive  controllers.  (As 
there  is  no  unique  definition  of  adaptive  control  ([Astrom  87], 
[Landau  79]),  the  one  given  above  represents  the  author’s  choice.) 

An  important  feature  of  most  adaptive  controllers,  including  Model 
Reference  Adaptive  Controllers  (MRAC),  Self  Tuning  Regulators 
(STR)  and  Gain  Schedulers  (GS),  is  that  they  are  model  based,  that 
is  the  control  law  c  varies  according  to  the  best  estimate  of  some 
system  model  parameters,  p.  Usually  an  estimation  or 
identification  scheme  c*  -cme  o r  Vi  of  the  system  parameters  is 
directly  or  indirectly  employe  '.Astrom  87],  [Landau  79]).  The 
parameters  estimates  are  ren  ^sed  in  modifying  the  structure  of 
the  control  law,  c  in  an  fashion,  automating  in  the  process 

some  control  design  rules  A  ieneral  block  diagram  is  depicted  in 
Figure  1. 

In  contrast,  the  class  of  ’ear-  .  jntroller.  ([Arimoto  et  al 
85][Bondi  et  al  88],  (Kawamura  et  al  84])  also  called  repetitive 
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control,  or  betterment  control,  (to  be  distinguished  however  from 
learning  automata  as  described  in  [Narendra  and  Thatachar 
74][Tsypkin  71  ][Fu  87]  assume  that  the  reference  signal  v(t)  is 
periodic,  the  structure  of  the  controller  (i.e.  the  function  c)  is 
fixed,  and  learning  is  exhibited  through  the  Iterative  modification 
of  the  (open  loop)  time  function  uft).  Its  general  block  diagram  is 
described  in  Figure  2. 


Figure  1:  General  block  diagram  for  adaptive  controller 
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Figure  2:  General  block  diagram  for  learning  controller 


Based  upon  the  review  of  many  works  which  are  described  below 
Table  1),  the  class  of  neurocontrollers  is  hereby  defined  as  the 
set  of  all  controllers  whose  structure  (control  law) 
c(x,v,z,z)  is  based  on  some  neural  network  model 
architecture  and  learning  paradigm.  It  must  be  emphasized 
that  the  neural  network  model  and  the  learning  algorithm  are 
defined  independently  from  any  specific  process  to  be  controlled 
(S),  and  is  usually  contributed  by  CNS  anatomists  and 
psychphsiologists.  Neurocontrollers  are,  therefore,  not  model 
based,  in  the  sense  associated  usually  with  adaptive  controllers. 
Moreover,  since  the  learning  algorithm  employed  by 
neurocontrollers  do  not  require  periodic  input,  but  rather  can  learn 
from  arbitrary  example  sets,  the  class  of  learning  controllers  is  a 
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proper  subset  of  the  neurocontroller  set. 

Although  neurocontrollers  are  different  from  the  model  based 
adaptive  controllers  described  above,  they  do  perform  real  time 
adaptation  through  their  online  learning  capabilities.  The 
robustness  of  neurocontrollers  stems  from  the  fact  that  they  are 
not  model  based.  It  is  expected,  however,  that  when  compared  with 
classical  controllers  with  the  same  throughput  (processing  power) 
their  accuracy  will  be  inferior. 

Finally  when  some  or  all  of  the  nodes  of  the  neural  network 
employed  in  the  neurocontroller  are  specifically  allocated  for  the 
estimation  and  storage  of  all  or  a  subset  of  the  system 
parameters,  p,  we  obtain  a  neuromorphic  realization  of  model 
based  adaptive  controllers.  In  that  sense  one  may  regard  the  set  of 
adaptive  controllers  as  being  contained  in  the  class  of 
neurocontrollers. 

1 1 L. _ Ne.urQQQ  nlrQlle.cs 

Having  defined  the  class  of  neurocontrollers  in  the  previous 
section,  we  shall  now  focus  our  discussion  on  the  various 
architectures  and  neurocontroller  structures  found  in  the 
literature.  Neurocontrollers  are  classified  according  to:  1)  the 
neural  network  model  employed,  2)  the  learning  algorithm 
employed,  3)  their  mode  of  operation,  i.e.,  whether  the 
learning/adaptation  phase  is  done  offline,  prior  to  the  engagement 
of  the  controller,  or  whether  learning  and  adaptation  are 
continuously  present  in  the  neurocontroller  performance. 

Vast  amounts  of  literature,  which  deal  with  sensory  motor 
control,  understanding  anthropomorphic  motion,  robot  control, 
locomotion  control,  etc.  is  available  today.  Our  focus  has  been  on 
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reports  that  specifically  aim  at  employing  neural  network 

architectures  and _ learning  algorithms  in  the  design  and 

construction  of  controllers  for  artificial  dynamic  systems.  The 
criterion  implied  that  many  reports  on  the  use  of  neural  networks 
in  understanding  sensory  motor  control,  (in  the  biological  sense) 
and  in  learning  static  mapping,  with  or  without  feedback,  as  well 
as  reports  on  learning  automata  which  do  not  employ  neural 
networks,  were  outside  the  scope  of  our  study. 

Table  1  below  provides  a  classification  of  many  neurocontroller 
design  examples.  For  the  readers  convenience  we  shall  briefly 
introduce  the  relevant  terminology  and  algorithms. 
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Table  id 


I 

I 
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table  !c 
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Supervised  Neurocontrol  ier 
Back  Error  Propagation  -  (BEP) 

This  is  probably  the  most  commonly  used  learning  algorithm  its 
employment  presumes  a  feedforward  multilayer  perception  architecture 
as  described  in  Figure  3. 


output 


U  Output  Layer 


U  Second  Hidden  Layer 


U  First  Hidden  Layer 


Input 

Figure  3:  Three  layer  network 


The  network 


's  of  sev^a:  layers,  with  only  feedforward 


interconnects.  These  ere  n.  nodes  r he  Mn  layer.  The  output  of 
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each  node  is  usually  a  sigmoidal  activation  function  of  the  form 

k  l 

Xj  ,  - - - 

-<  Z  wf|xf'**Woi)  tXI> 

I  *  «  j.  i 


where  Wo!  is  a  bias  weight  that  is  assumed  to  be  connected  to  a  unit  that 
is  permanently  on,  Wkjj  is  the  connection  weight  between  the  ith  unit  in 
the  (k-l)th  layer  and  the  jth  unit  in  the  kth  layer,  xj  Is  the  output  of  the 
jth  unit  in  the  kth  layer. 

The  Back-Propagation  learning  algorithm  is  defined  as  follows.  For  the 
output  layer  the  error  is  given  by 

(3-2) 

where  Xd  is  the  desired  output  of  the  network.  For  ill  other  layers  the 
error  is 

.  .  \*i 

**  J  i  uA  _ 


M'-x  2 wfj  . 


The  rule  for  changing  the  .veignts  is 

wfj(n^)  =  w|tj(n)+  n5;Xj''  ♦  y(  wf/n)  -  wf/r.-D)  (3a) 

where  h  and  g  are  learning  *r.e  ano  momentum  factors  respectively. 


CMAC  (Cerebellar  Model  MlimilK  Computer)  Developed  by  (Aibus  75], 
CMAC  is  basically  a  table  ioo*  up  technique  for  reproducing  vectorial 
functions,  with  a  novel  ha:rog  method.  Based  on  theories  of  brain 
structure  and  functions,  aious  developed  an  architecture  which  consists 
of  an  ascending  "sensory  croc  ess  :r,g  hierarchy"  coupled  to  a  descending 
"goal  decomposition  hierarchy”  via  "world  models"  at  each  level.  Within 


mit.-metm  Comi 
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the  goal  decomposition  hierarchy,  sensory  feedback  variables  and  control 
goals  are  transformed  into  control  outputs  using  a  “ChAC"  module.  Albus 
implemented  a  controller  for  a  seven-degree  of  freedom  master-slave  arm 
using  the  CMAC  module.  However,  in  general,  the  CMAC  approach  requires 
too  much  memory  to  produce  control  transformations  with  sufficient 
accuracy  for  practical  application. 


IMS  (Least  Mean  Square)  or  Widrow  Hoff  and  Adaline 
The  following  IMS  algorithm  summary  has  been  adopted  from  [Barto  83]. 
Figure  4  shows  a  functional  diagram  and  schematic  symbol  for  an  adaptive 
linear  threshold  logic  element  (sometimes  called  “Adaline”)  [Widrow  85], 
(Widrow  Smith  64].  The  diagram  indicates  the  terminology  used  and  the 
input-output  relationships.  The  zeroth  input-signal  component  is  always 
+  1.  Thus  the  zeroth  weight  wo  controls  the  threshold  partitioning  level. 
Before  adaptation,  an  error  e  (defined  as  the  difference  between  the 
output  y  and  the  desired  resoonse  d)  exists  for  each  Input  pattern.  The  jth 
pattern  would  have  an  error  of 

ej  *  dj  -  yj  *  dj  -  X  jTy/j  (3.5) 


where  Xj  is  the  jth  input  pattern  vector  and  Wj  is  the  jth  weight  vector.  It 
is  assumed  here  that  the  weight  vector  is  adapted  with  each  new  input 
vector  Xj. 


If  the  inputs  Xj  and  dj  are  s! 
error  (mse)  is  a  quadratic  * 
optimal  Weiner  weight  vectc 


tatistieally  stationary,  then  the  mean-square 
unction  of  the  weights  and  there  exists  an 
:•  wh'c*  minimizes  It.  Learning  or  updating  of 


the  weights  can  be  done  by  a  Talent-descent  technique  The  least-mean- 
square  (LMS)  algorithm  devei.xed  be  Widrow  and  Hoff  uses  the  error  as  an 
estimate  of  the  gradient.  Tms  leads  to  the  weight  iteration  rule 

Wj H  *  Wj  ♦  (  a  /  to H ) )  ej  Xj  (3  6) 


where  n  ♦  I  is  the  total  number  of  weights  and  a  is  a  coefficient 
determining  the  fraction  of  the  error  ej  corrected  with  each  adaptation. 
The  parameter  a  controls  the  stability  of  the  adaptive  process  and  the 


Neurocontrollers,  pg.17 


rate  of  convergence.  The  adaptive  process  has  been  shown  to  be  stable 
(convergent)  if  a  is  within  the  range  2s  a  >0.  Choosing  a  in  this  range 
ensures  that  ej  is  reduced  by  the  jth  adaptation. 

The  "learning  curve"  plot  of  mse  versus  the  number  of  adaptation  cycles  is 
a  noisy  exponential  whose  time  constant  can  be  shown  to  be 

tmse*  (n+l)/2a  adaptions.  (3.7) 

Formula  (3.7)  is  exact  when  all  eigenvalues  of  the  input  correlation  matrix 
R  *  ElXjXjT]  are  identical.  Even  when  the  eigenvalues  differ  substantially 
and  the  learning  curve  is  not  simply  a  single  exponential  plus  noise,  but  is 
a  sum  of  expo  tials  plus  noise,  experience  has  shown  that  in  most  cases 
the  learning  a.  .e  can  be  well  approximated  by  a  single  exponential  having 
the  time  constant  given  by  (3.7). 
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DESIRED  RESPONSE 


Figure  4:  Automatically  adapted  threshold  logic  element  (Adeline) 

So  for  the  BEP,  CM  AC,  or, a  LMS  desrrioed  neural  network  architectures 
which  required  explicit  specifications  of  the  desired  response  of  each 
output  unit  at  all  times  Marneiy  some  kind  of  a  supervisor  or  a  teacher  is 
required  for  their  operation 


The  employment  of  BEP.  CMAC,  LMS  or  any  other  supervised  learning 
algorithm  with  its  underlying  neural  network  structure  (multilayer 
perception  for  BEP,  linear  threshold  units  for  LMS  or  simple  computer 
memory  for  CMAC)  may  c /  lumped  together  under  one  category,  hereby 
defined  as  the  supervised  neurocontr.-l’e'  f,  js  follows 
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Figure  5  describes  the  general  architecture  of  2  supervised 
neurocontroller  architecture,  it  consists  of  a  teacher,  the  trainable 
controller  and  the  process  to  be  controlled.  The  teacher  may  oe  automated 
as  a  linear  or  nonlinear  control  law,  or  it  may  be  a  human  expert  in  the 
case  of  training  with  an  automated  control  law,  the  system  engineer 
provides  knowledge  of  the  system  dynamics  through  analysis  of  tne 
controlled  process.  When  the  teacher  is  a  human,  we  use  knowledge  of  tne 
system  acquired  through  direct  experience.  The  controller  consists  of  one 
of  the  neural  network  architectures,  e.g.  LMS,  BEP,  or  CMAC,  that  is 
suitable  for  supervised  learning.  Supervised  learning  is  used  in  this 
architecture  to  provide  control  over  the  control  law  being  learned.  The 
state  of  the  controlled  process  is  provided  to  both  the  teacher  and  the 
network.  The  teacher  defines  the  desired  performance  of  the  controller  by 
providing  examples  of  now  me  process  is  to  be  controlled.  During  training, 
the  teacher  controls  the  system  oy  application  of  the  continuously  valued 
control  signal  u.  After  training  the  network  controls  the  process  and  the 
teacher  is  removed. 


Controlled  j  j 
Process  j — j  { 


feeJL'-  k  oCafe  varices  Sensors 


Figure  5:  Supervised  neurocontroller  architecture 

The  class  of  supervised  neu>'oconk  rollers  is  most  popular.  An 
overwhelming  majority  of  tne  ><ot  ■  -rji  ^controller  designs  employed 
an  architecture  similar  ro  *  he  one  given  in  Figure  5.  CMAC  is  employed  by 
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[Albus  31]  and  [Miller  88).  BEP  is  employed  by  many  including  uPsaitis  37], 
[Pao  88],  [Guez  Sel insky  88a],  (Guez  Ahmad  87],  (Josin  83]),  IMS  (Adai’re 
arrays)  is  used  by  ([Widrow  37],  [Kawato  87],  [Guez  Selinsky  88b],  [Witten 
77],  [Widrow  Stearns  35],  [Tolat  widrow  88)). 

Neurocontrollers  With  a  Critic  end  insuoervised  Neurocontrollers 
in  many  situations  the  assumptions  made  in  the  supervised 
neurocontroller  design,  regarding  the  availability  of  an  expert  knowledge 
to  "guide"  the  neurocontroller  during  learning  phases  may  be  false,  in 
particular,  in  neurocontrol ier  design,  the  desired  neurocontroller  output 
(i.e.  the  plant's  input)  for  a  specific  plant  response  is  often  unavailable, 
even  theoretically,  due  to  ignorance  about  the  environment  and  the  plant's 
dynamics.  Under  these  circumstances,  it  is  essential  for  the 
neurocontroller  to  possess  some  'self  improving"  or  "self  organizing" 
capabilities. 


.  .«*.  »«•  /  ,* 


We  distinguish  between  two  situations  The  first  is  r.jined  neurocontroller 
with  a  critic  fWidrow  37].  and  s  depicted  in  Figure  6.  The  teacher  in 
Figure  5  is  replaced  with  a  critic  The  key  difference  ^mg  that,  while  the 
teacher  is  able  to  provide  a  specific  desired  response  for  each  and  every 
output  unit  or  effector  in  rne  neural  network,  the  critic,  can  only 
scalar  function  which  rdicates  now  well  the  reurocor^al'er 
performing  it  is  lefr  fne  'earning  algorithm  ■■ 

neurocontroller  to  mal  e  us*  *  'r;s  scalar  critic  signal  *:se:ner 
systems  response  in  modifying  *•  ^  "e..;  . :  ant  roller  struv! .  -  r.  :s  : . 
of  neurocontrol  has  sevr^s- ve^'ors  -'uscaMed  'boots’- ac  .lactavi’' 
"punish/reward"  in  (Wid^w  e*  a  3 ').  'reinforced  learning  •■.[3 
et  a  I  81).  It  has  been  e-npioy^j  '■  -vdrow  87]  with  Aoarres.  ■  ':ar- 
al  82]  and  in  (Barto  Sutton  6/1  v. '-n  reir  associative  searon  ret .v;r* 
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Figure  6:  Neurocontroller  with  a  critic 

Another  class  of  neurocontrollers  named,  unsupervissd  neurocontrollers  is 
employed  whenever  no  external  information,  other  than  the  system's 
response,  is  available  in  real  time  to  the  neurocontroller,  regarding  its 
performance.  Then  in  a  manner  similar  to  adaptive  or  learning  controllers, 
the  neurocontroller  implements  some  internal  performance  index 
(specified  through  the  selection  of  the  neural  network  and  learning  model; 
minimization  as  its  adaptation  "--/amsrn 


Several  neural  network  mocs'S 
class.  The  Adaptive  Resonance 
Grossberg,  Carpenter,  Caoen  •/ 
87],  [Cohen  Grossberg  83 i  [Sr-:: 
maps  by  Kohonen  [Kohcnen  8 


sr.*:!  learning  algorithms  belong  to  that 
~reory  (ART)  based  sets  of  models  of 
f"eir  colleagues  ([Carpenter  Grossberg 
•rg  82]),  as  well  as  the  self  organizing 
;  The  latter  model  was  applied  to 


neurocontrol  by  [Graf  La  londe  88] 
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Furthermore,  several  workers  attempted  to  modify  supervised  learning 
algorithm  and  employ  them  in  an  unsupervised  mode.  Widrow  [Widrow 
Stearns37]  utilizes  his  IMS  algorithm  in  what  he  names  adaptive  inverse 
control. 

This  technique  is  illustrated  in  Figure  7.  The  plant  input  is  as  before.  The 
plant  output  is  the  input  to  the  adaptive  filter,  which  is  implemented  with 
an  Adaline.  The  desired  response  for  the  adaptive  filter  is  the  plant  input 
in  this  case.  Minimizing  mean  square  error  causes  the  adaptive  filter  p-i 
to  be  a  best  least  squares  inverse  to  the  plant  p  for  the  given  input 
spectrum  and  for  the  given  set  of  weights  of  the  adaptive  filter.  The 
adaptive  algorithm  attempts  to  make  the  cascade  of  plant  and  adaptive 
inverse  behave  like  a  unit  gain. 


Figure  7:  Adaptive  Inverse  control 


The  Adaptive  inverse  . 
modifications  m 
where  it  is  called  ">nd 
called  "feedback  error  i 


.r.trol  has  been  employed  in  various  forms  and 
•  .v  „nd  Stearns  35],  [Eisley  38],  [Psaltis  et  al  37]) 
-ect  learning"  and  [Kawato  et  a!  37]  when  it  is 

■an-  mg". 


If  the  8£P  algorithm  is  applied  to  configurations  which  lack  access  to  the 
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desired  neurocontroller  response,  we  may  use  the  output  of  the  unknown 
system  operator  to  propagate  back  the  error  as  follows.  Figure  5  descnoes 
the  neural  network  (multilayer  feedforward  type  with  L  layers)  and  me 
unknown  map  g  to  which  it  is  connected. 


Figure  8:  BEP  through  unknown  system 


Let  Xi  €  Rnl,  X1-  i  Rnl,  q  -  •  phi,  y  =  g(xL )  as  in  Figure  8.  Also  let  dj  be 
the  desired  value  of  v»  w  me  vector  xi  is  an  input,  let  ej  =  y\  -  d,  be 
the  i th  error.  Note  that  r:  msired  value  for  XL  is  specified  or  available. 
which  is  usually  the  case  n  controller  design.  It  car.  be  shown  that  the 
learning  equations  (corn  care  to  equation  (3.4))  are,  start  Log  at  the  last 
(output)  layer  connections. 


WJ  k(n+!)  =  W|  k(n)  -  n  ■ ' Xj 


L- 


4e;  \ 

j  r- 1 


/  L  L  .  1 

■  V  I'A'j  k(n)  "  Wj  >;n-  ’  'j 


i  Ju 


I  ' 


* 

Where  Wj  k(n)  is  the  con oect/or  a-  u  the  time  step  n  • 
in  layer  L- 1  to  thek^th  unit  -n  the  test  iaver  L,  where  f  is  anve 

of  f  the  node  update  •usual-ty  simm  :a!)  function,  =  ay ,/>/!:  estimated 

via  Ayi/AXk  and  XjN  t-fr  <rut».u‘  tne  jtn  unit  in  the  N.th  layer  tren 
proceeding  with  the  next  (t-D 
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Wj  fWi)  *  Wj’j'cn)  ♦ 


.AWn. 


k  ,.,L 


rTMwik(n) 


*  x 


f'Xi'2-y 


L-l  (-!  I 

Wj  j  (n)  -  Wj  j  in- !  !j 


(3,9). 


This  process  is  repeated  for  all  layers.  The  modified  BEP  makes  use  of  the 
usually  available  data  about  the  desired  systems  output  to  direct  learning. 


Fixed  Architecture  Neurocontrollers 

So  far  we  only  discussed  neurocontrollers,  which  allow  through  some 
online  or  offline  training  or  learning,  for  the  modification  of  their 
architectures.  Another  class  of  reurocontroliler  which  we  denote  as  fixed 
architecture  neurocontrollers  are  oased  upon  neural  network  models  with 
fixed  architectures  such  as  — ;of »eid  binary  and  continuous  models 
[Hopfield  82],  [Hopfield  3-J]  Examples  of  neurocontrollers  in  this  class  are 
given  in  [Tsutsumi  Matsumoto  V]  and  (Suez  et  al  88]  me  architecture  for 
these  neurocontroller  is  le’ected  according  to  a  cho  se  of  parameters  of 
an  energy  cost  function  to  oe  minimized  as  in  (Hopfie  :  Tank  85],  or  use  of 
the  "outer  product"  rule  1  ;n  re  binary  made!  case)  [r.,pfield  82).  Another 
method  for  the  selection  •■'e  neurocontroller  architecture  to  possess  a 
prescribed  steady  state  r.  ecology  is  described  in  (Guez  Protopopescu 
Barhen  88].  It  transform:  problem  to  the  Sv'urm  *>r  ? 

piecewise  linear  inequa'iv,-;. 


Based  on  the  neurocor-’oilers  c/iscussion  $ »ven  above,  easy  ' > 
the  following  observation  Compare  the  ^ck  diagrams  on  Figures  3  -3,  \ 
and  8.)  The  difference  between  tne  supervised  neuroc  Y.rr'm^:. 


neurocontrollers  witn  a  c ritic.  unsuservused  neuroconr.ro Hers,  and 
architecture  neurocofro  1 1  er$  mainb,  lies,  in  the-  amour  t  of  enrmr; 


knowledge  (teacher's  or  critics  Know  that  is  imO'S-iented  m  rre 
neurocontroller  model  For  example,  it  is  clear  from  Figures  5  through  o 
that  if  the  teacher  or  cncic  'unctions  uf  ?.  supervised  neurocontroller  are 
implemented  via  neural  network  roooel,  they  could  be  included  in  the 
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"neurocontroller  Pox"  and  result  in  an  unsupervised  neurocontroller. 
iv.  Summary 

Neural  network  architectures  possess  computational  features  that  may  be 
useful  in  the  construction  of  complex  dynamical  system  controllers,  in 
this  article  we  reviewed,  defined  and  classified  such  controllers,  named 
neurocontrollers.  The  various  neurocontroller  classes,  namely  the 
supervised,  unsupervised,  fixed  architecture  and  neurocontrollers  with  a 
critic  have  been  shown  to  differ  mainly  in  the  extent  of  apriori  (teacher) 
knowledge  which  is  mechanized  in  the  neurocontroller  architecture  itself, 
rather  than  made  externally  available.  We  also  discussed  the 
neurocontroller's  role  as  a  nonlinear,  learning  and  adaptive  controller. 
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ABSTRACT  . 

Indirect  adaptive  control  of  low  order  plants  that  are  subjected  to  parametric 
variations  arising  from  changes  in  operating  environment  requires  real  time  dynamic  system 
identification.  In  this  paper  we  propose  a  control  scheme  that  utilizes  a  nearest  neighbor 
search  type  of  classifier  capable  of  learning  to  dynamically  identify  these  variations  in  plant 
parameters.  The  neural  network  architecture  employed  is  based  on  the  Adaptive  Resonance 
Theory  (ART-II)  proposed  by  Stephen  Grossberg  and  Gail  Carpenter,  1987.  An  adaptive 
pole  placement  controller  for  a  slow  time  varying  linear  second  order  system  is 
implemented  based  upon  this  architecture  to  assess  the  performance  of  the  network  and  the 
overall  control  scheme  with  the  neural  network  in  the  control  loop.  The  control  strategy  is 
based  upon  identification  of  changes  in  the  time  response  characteristics  of  the  system  to 
standard  test  signals  which  are  assessed  by  the  network.  A  pole  placement  algorithm  is 
utilized  to  relocate  the  poles  of  the  overall  closed  loop  system  by  altering  the  gains  of  the 
process  controller  to  obtain  desired  system  response.  Experimental  studies  on  a  simulated 
system,  employing  a  Proportional  Derivative  controller  are  encouraging. 

Key  Words :  Adaptive  control;  System  identification;  Feature  extraction;  Pole  placement; 
Learning;  Supervised  learning  mode;  Parametric  variation. 

1  INTRODUCTION  AND  BACKGROUND 

The  most  significant  advantage  in  applying  neural  networks  lies  in  the  parallel 
distributed  nature  of  processing  (PDP)  that  can  be  realized  through  their  implementation  in 
hardware  [Rumelhart  &  McClelland,  1986;  Lippmann,1987].  In  addition  to  PDP,  artificial 
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neural  networks  demonstrate  the  ability  to  learn  a  task  either  through  training  by  example, 
(supervised  learning )  or  on  their  own,  by  means  of  heuristic  mechanisms  built  into  their 
architectures  ( Unsupervised  learning ).  The  necessity  of  specifying  explicit  instructions  in 
the  form  of  programs  or  algorithms  is  thereby  eliminated.  For  instance,  it  is  possible  to 
model  the  dynamics  of  an  unknown  process  by  imposing  on  a  neural  network  the  process 
inputs  and  outputs  to  enable  it  to  learn  the  transfer  function  in  the  form  of  a  mapping  from 
an  input  to  output  space.  [Psaltis,  1987;  Guez  &  Selinsky,  1988;  Guez  1988].  Fast 
association  and  recall  is  another  attribute  of  massively  interconnected  networks  wherein  the 
time  required  for  association  and  recall  of  information  after  training  remains  independent  of 
the  past  learning  history  or  the  size  of  the  network.  The  ability  of  a  neural  network  to 
generalize  from  insufficient  or  partial  information  is  found  to  be  especially  useful  when  the 
input  is  corrupted  by  noise  and  would  enable  it  to  be  a  succesful  computing  architecture  for 
adaptive  controllers. 

Adaptive  Control  :  The  starting  point  in  any  adaptive  control  scheme  is  a  feedback 
control  loop  within  the  process  and  a  controller  with  adjustable  parameters.  The  main  issue 
is  to  find  a  method  for  changing  the  controller  parameters  in  response  to  changes  in  the 
plant  parameters  resulting  from  changes  in  the  environment.  Of  the  many  schemes  for 
adaptive  control,  Model  Reference  Adaptive  Control  [Astrom,  1983,84;  Landau, 1979],  Self 
Tuning  Regulation  [Astrom,  1983,84;  Ortega  &  Kelly, 1984;  Cameroon  &  Seborg,1982] 
and  Gain  Scheduling  are  most  popular.  All  of  the  above  control  schemes  involve  a  system 
identification  component.  They  differ  only  in  the  techniques  they  employ  to  accomplish  the 
task  of  system  identification  and  tuning  of  the  controller.  Adaptive  Control  schemes  can  be 
broadly  classified  into  two  categories,  namely,  Direct  Adaptive  Control,  (DAC)  and 
Indirect  Adaptive  Control,  (IDAC).  In  the  former,  the  controller  parameters  are  directly 
adjusted  on-line  such  that  the  error  between  the  plant  output  and  the  output  of  an  assumed 
model  asymptotically  tends  to  zero.  In  IDAC,  the  parameters  of  the  unknown  plant  are  first 
estimated  using  an  estimation  technique  and  these  are  in  turn  used  to  adjust  the  parameters 
of  the  process  controller.  See  figure  1  for  a  block  diagram  a  generic  indirect  adaptive 
control  system. 


Figure  1  Block  diagram  of  a  generic  indirect  adaptive  control  scheme 

employing  a  system  identifier,  an  adjustment  mechanism 
and  a  process  controller. 

System  Identification  :  System  identification  is  the  experimental  approach  to  process 
modeling.  It  includes,  experimental  planning,  selection  of  model  structure,  parameter 
estimation  and  validation.  The  underlying  purpose  of  system  identification  in  control 
systems  is  to  design  control  strategies  or  to  alter  existing  strategies  to  obtain  desired 
performance  in  response  to  changes  in  the  system's  environment  characterized  by  changes 
in  the  system's  disturbance  dynamics.  System  identification  is  also  used  to  analyze  the 
properties  of  a  particular  system.  Literature  survey  on  system  identification  [Astrom  & 
Eykhoff,1971;  Astrom  &  Wittenmark,  197 1,1984;  Goodwin  and  Payne, 1977]  showed  that 
in  automatic  control  the  knowledge  about  a  system  and  its  environment,  which  is  required 
in  the  design  of  a  control  system,  is  seldom  available  a  priori.  Even  if  the  equations 
governing  the  system  are  known  in  principle  or  it  is  possible  to  obtain  them  by  performing 
experiments  on  the  system,  it  often  happens  that  knowledge  of  a  particular  parameter  is 
missing  or  the  system  is  too  complex  to  model.  Hence  the  need  for  approximating  system 
behavior  through  assumed  models  and  estimating  their  parameters  through  system 
identification  techniques.  Identification  problems  can  be  formulated  using  different 
frameworks.  Most  identification  problems  are  modelled  as  optimizationproblems  where 
the  main  objective  is  the  formulation  of  different  process  models  that  minimize  a  functional 
of  the  process  and  model  outputs  called  the  loss  function. 

In  this  paper,  we  report  on  an  attempt  to  formulate  the  system  identification 
problem,  for  the  purpose  of  adaptive  control,  in  an  identification  framework  that  exploits 


the  computational  features  of  an  Adaptive  Resonance  Theory  based  neural  network.  We 
have  chosen  the  ART-II  model  with  the  hope  that  the  network  would  learn  on  a  single 
presentation  of  an  input  pattern  and  that  it  would  be  self  organizing,  with  the  clustering  of 
input  patterns  being  done  without  an  explicit  specification  of  the  classification  desired.  The 
additional  advantages  of  using  the  ART-II  that  we  expected  were:  a)  Elimination  of  the  need 
to  retrain  the  network  when  there  is  a  change  in  the  prototype  set  of  input  patterns  or  when 
more  patterns  are  added  to  it;  b)  Better  control  over  the  classification  in  terms  of  fine  and 
coarse  categorization  via  the  attentional  vigilance  parameter  of  the  network  and  c)  The 
network's  capability  of  unsupervised  learning. 


2  DESCRIPTION  OF  THE  PROBLEM 

We  are  given  a  slow  time  varying  linear  dynamical  system  ( Plant  ),  Gp,  modeled, 
say  as  a  second  order  system,  with  plant  constant  Km  and  a ,  P  as  the  unknown  constant 
or  time  varying  parameters  and  a  process  controller  Gc,  say  a  Proportional  Derivative 
controller  with  gains  Kp  and  Kd.  Our  goal  is  to  implement  an  adaptive  pole  placement 
control  scheme  using  a  neural  network  architecture  that  would  identify  current  plant 
parameters  in  order  to  estimate  and  tune  arbitrarily  assigned  initial  gains  of  the  controller 
such  that  overall  system  poles  can  be  relocated  via  a  pole  placement  module  to  obtain  a 
system  response  that  matches  the  one  given  by  an  assumed  ideal  model.  It  should  be  noted 
that  although  the  plant  and  the  controller  are  linear  the  process  is  overall  nonlinear. 


Let  the  plant  transfer  function  be  represented  as 
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Gp  (s)  = 


Km 

s2+  as  +  P 


Let  the  reference  input  be  a  known  periodic  function  of  the  type: 


r  (t)  =  r  (t  +  T)  ,  for  all  t  >  0  (2) 

We  seek  to  find  online,  the  Proportional  and  Derivative  (PD)  gains,  Kp,  Kd  and  the  D.C. 
bias  C  (see  Figure  2),  such  that  the  control  law  given  by: 


u  =  Cr-Kpy-Kdsy  =  Cr-Gcy  (3) 

will  result  in  the  ideal  closed  loop  transfer  function  of  the  form: 


Gq(S)  = 


Y(s)  _ 
R(s) 


[s2+  2  C  +  £0n 


(4) 


where,  K*,  C  t  <o*n  are  respectively  the  desired  D.C.  gain ,  damping  coefficient  and  the 
natural  frequency  of  the  system. 


3  CONTROLLER  DESIGN 

We  describe  in  this  section,  the  design  aspects  of  the  proposed  controller 
highlighting  details  of  the  various  functioning  modules  involved  in  the  process.  We  first 
provide  an  explanation  of  our  approach  followed  by  the  underlying  methodology. 

Approach  :  The  general  block  diagram  of  the  overall  control  scheme  is  shown  in  figure 
3.  The  common  a  priori  assumption  in  the  design  of  controllers  for  partially  known 
processes  is  adopted  with  the  design  procedure  being  divided  into  two  steps:  identification 
and  control  (indirect  adaptive  control  strategy)  [Astrom  &  Eykhoff,1971]. 
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Figure  3  Block  diagram  of  the  neuromorphic  adaptive  control  scheme.  Parameters 
that  correspond  to  the  desired  location  or  the  overall  system  poles  are  input 
to  the  pole  placement  module  along  with  the  current  estimates  of  the  plant 
parameters  provided  by  the  neural  network.  The  pole  placement  module 
computes  the  required  controller  gains  and  the  D.C.  adjust  parameter 
to  modify  the  original  system  response  shown  at  the  output  of  the  plant. 


Methodology  :  Identification  of  plant  parameters  is  achieved  by  extracting  the  features  of 
the  system's  closed-loop  transient  time  response  to  a  step  input.  The  neural  network  in  the 
control  loop  which  is  trainsdio  map  features  of  a  system's  time  response  to  its  parameters, 
gives  the  current  parameter  estimates  of  the  plant  in  the  closed  loop  process  shown  in  figure 
2-  The  pole  placement  algorithm  incorporated  utilizes  the  estimates  provided  by  the  network 
to  compute  new  controller  gains  in  order  to  modify  the  resulting  system  response  to  suit  the 
one  obtained  by  the  assumed  ideal  model.  A  D.C.  adjustment  mechanism  is  incorporated 
to  compensate  for  any  D.C.  bias  that  might  be  associated  with  the  original  system 
response.. 


The  control  scheme  proposed  therefore  comprises  of  the  following 
fundamental  processing  modules:  (a)  Plant;  (b)  System  Identifier  incorporating  a  Feature 
Extractor,  a  neural  network  and  a  pole  placement  module  and  (c)  Controller  with  the  DC 
gain  adjust  mechanism,  the  purpose  of  the  system  identifier  is  to  identify  the  plant 
parameters  in  response  to  changes  within  its  environment.  The  time  response  of  the  system 
is  characterized  by  its  features  or  performance  indices  which  are  nonlinear  functions  of  the 
system  parameters.  Th t  feature  extractor  incorporated  in  the  control  loop  determines  the 
performance  indices  associated  with  the  response  to  enable  system  identification  via  the 
neural  network.  It  must  be  noted  that  identification  is  dependent  on  the  features  of  the  time 


7 


response  rather  than  the  response  per  se.  This  is  done  in  order  to  compress  the  informadon 
contained  in  the  response  such  that  the  input  vector  to  the  neural  network  remains  compact 
[Kumar  &  Guez,1989].  This  procedure  restricts  the  dimension  of  the  neural  network  to  a 
minimum  thereby  increasing  its  computational  speed  and  overall  efficiency  of  the  process. 

Pole  Placement :  The  overall  transfer  function  is  obtained  from  equations  (1),(2)  and  (3) 
as  explained  by  the  following  input  output  relations.  Refer  to  the  appendix  for  a  state  space 
description  of  the  proposed  adaptive  control  scheme.  The  output  of  the  process  is  given 
by: 

Y  (S)  =  Gp  (S)  U  (S)  =  Gp  (S)  [  C  R  (S)  -  Gc  (S)  Y  (S)]  (5) 

while  the  overall  transfer  function  of  the  process  is : 

G  M  Y<s>  C°pW  _ _ CKm  _  (6) 

0  R(s)  1  +Gp(s)Ge(s)  s2+(0  +  KmKd)s+  (P+KmKp) 

or 

G0(s) 
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v*  Q)  n  -P 

K  p  K~ 


km 


*  2  C  Q)*n  - « 


Kd= 


“■m 


P*_  K  ( P  +  KmK  p) 
Kr 


V*  * 

K  un 


“•m 


K 


m 


(12) 

(13) 

(14) 


Equations  (12)  through  (14)  constitute  the  pole  placement  algorithm  used  in  the  pole 
placement  module  shown  in  figure  3. 

The  Feature  Extractor : 


TIME  RESPONSE 


Figure  4  Block  diagram  of  the  feature  extractor  module. 


Th e  feature  extractor  in  the  adaptive  control  scheme,  illustrated  in  the  figure  4,  is  used  to 
determine  the  performance  indices  or  features  of  a  particular  time  response  during  its 
transient  state.  The  feature  extractor  is  implemented  in  software  and  the  performance 
indices  given  by  it  are  passed  to  the  identifier  which  is  implemented  in  the  form  of  a  neural 
network.  The  performance  indices  used  to  characterize  the  time  response  are:  a)  Delay  time , 
which  is  a  measure  of  the  time  required  by  the  response  to  reach  50%  of  the  final  value  in 
the  first  attempt;  b)  Rise  time,  which  measures  the  time  required  by  the  response  to  rise 
from  10%  to  90%  of  the  final  value;  c)  Settling  time,  which  is  a  measure  of  the  time 
required  by  the  response  to  reach  and  stay  within  a  specified  tolerance  band  (usually  2%  to 
5%)  of  its  final  value;  d)  Peak,  which  is  the  maximum  value  of  the  response  (For  the 
overdamped  case  and  the  critically  dampedtcase  the  peak  value  is  assumed  to  be  unit);  e) 
Peak  time,  which  is  the  time  required  for  the  response  to  reach  its  peak  value  and  f) 
Maximum  Overshoot,  which  is  the  largest  deviation  of  the  output  over  the  step  input  during 


the  transient  state  (%  maximum  overshoot  =  (Maximum  overshoot  /  final  value )  x  100). 
The  feature  extractor  keeps  track  of  the  response  values  and  the  time  at  each  sampling 
instant  over  an  adaptive  time  window  for  the  step  input  and  a  prespecified  time  window  for 
the  square  function  and  the  square  wave.  Since  the  majority  of  the  performance  indices 
depend  on  the  transient  part  of  the  response,  it  seems  valid  to  fix  the  size  of  the  underlying 
time  window.  The  sampling  frequency  is  chosen  to  be  well  over  the  Nyquist  frequency  of 
the  response.  This  enables  the  feature  extractor  to  compute  the  various  performance  indices 
with  a  desirable  accuracy. 

Neural  Network  :  The  system  identifier  in  the  control  loop  is  implemented  in  the  form 
of  the  ART-II  neural  network, [Grossberg  &  Carpenter,  1987(a), 87(c), 87(d); 
Grossberg,1988].  The  ART-II  is  a  member  of  the  class  of  Adaptive  Resonance 
architectures  that  is  designed  to  handle  both  binary  and  analog  patterns  and  is  a 
modification  over  the  first  proposed  ART  architecture  called  ART-I  [Grossberg  & 
Carpenter,  1987(b)].  See  appendix  for  the  mathematical  model  of  the  neural  network. 

Training  :  The  ART-II  neural  network  is  implemented  on  a  digital  computer.  The 
network  is  used  in  the  supervised  learning  mode  and  is  trained  off  line  before  its  inclusion 
into  the  control  loop.  The  training  data  for  the  net  is  provided  by  a  training  data  module 
also  implemented  digitally  on  a  computer.  It  is  also  possible  to  train  the  network  on  line  if 
the  system  emulator  is  included  in  the  overall  control  scheme  shown  in  figure  3.  The 
training  data  module  comprises  of  a  data  generator  whose  inputs  form  the  approximate 
ranges  of  the  system  parameters  which,  in  the  present  application,  are  the  natural  frequency 
and  the  damping  ratio  of  the  generalized  second  order  system  given  by: 

G(S)  =  - — - — 

s2+2£c%s  +  C0n, 

A  typical  range  of  the  natural  frequency  is  estimated  from  the  knowledge  the  plant  time 
constant  used  in  the  process.  The  damping  factor  is  assumed  to  vary  between  0  <  £  <  2.0. 
The  ranges  selected  thereof  cover  approximately  the  entire  gamut  of  the  systems  time 
response  to  maximum  deviation  in  the  plant  parameters. 

The  system  emulator  consisting  of  the  process  model  (the  generalised  transfer 
function  of  the  system,  which  in  our  case  is  second  order  )/ generates  the  simulated  system 


response  to  the  various  input  signals  using  different  settings  of  the  parameters  £  and  o>n 
selected  from  respective  parametric  ranges  specified. 


Figure  5 


Block  diagram  of  network  training  module  comprising  of  the  system  emulator 
the  feature  extractor  and  the  neural  network. 


To  start  data  generation,  parameters  £  and  o>n  are  set  at  their  initial  values  and  the  standard 
input  signal  (the  step  input )  is  applied.  The  input  to  the  feature  extractor  is  the  simulated 
system  response  and  its  output  is  a  feature  vector  comprising  of  the  response’s 
performance  indices.  These  along  with  the  respective  system  parameters  £  and  G>n  form  the 
training  data  set  or  the  prototype  set  of  patterns  for  the  neural  network.  See  table  1  for  a 
cross  section  of  typical  network  training  data,  generated  by  the  data  generator. 


CROSS  SECTION  OF  NETWORK  TRAINING  DATA 

PARAMETERS 

FEATURE  YECTOR 

? 

U) 

peak 

peek  time 

overshoot 

rise  time 

settling  time 

deity  time 

0.30 

1.50 

1.37 

2.20 

0.37 

1.31 

8.80 

0.80 

0.40 

1.50 

1.25 

2.28 

0.25 

1.44 

6.67 

0.84 

0.50 

1.50 

1.16 

2.40 

0.16 

1.61 

5.41 

0.89 

0.60 

1.50 

1.09 

2.60 

0.09 

1.85 

4.59 

0.94 

0.70 

1.50 

1.05 

2.92 

0.05 

2.19 

4.05 

1.00 

0.80 

1.50 

1.02 

3.48 

0.02 

2.78 

3.69 

1.05 

0.90 

1.50 

1.00 

4.80 

0.00 

4.12 

3.51 

1.11 

1.00 

1.50 

1.00 

0.00 

0.00 

1.88 

19.96 

0.52 

1.10 

1.50 

1.00 

0.00 

0.00 

2.92 

19.96 

0.80 

1.20 

1.50 

1.00 

0.00 

0.00 

3.52 

19.96 

0.96 

0.70 

2.00 

1.05 

2.20 

0.05 

1.64 

3.03 

0.75 

0.80 

2.00 

1.02 

2.60 

0.02 

2.08 

2.76 

0.79 

0.90 

2.00 

1.00 

3.60 

0.00 

3.09 

2.63 

0.83 

1.00 

2.00 

1.00 

0.00 

0.00 

1.40 

19.96 

0.36 

1.10 

2.00 

1.00 

0.00 

0.00 

2.20 

19.96 

0.60 

1.20 

2.00 

1.00 

0.00 

0.00 

2.64 

19.96 

0.72 

1,30 

2.00 

1.00 

0.00 

0.00 

3.00 

19.96 

0.84. 

Table  1  A  cross  section  of  typical  training  data  generated  by  the  network  training 

module  for  0.30  <  C  6- 1.30,  A  C  =  0.10  and  1.50  £  ©n  £  2.00,  A  (On  *  0.50. 
The  feature  vector  corresponding  to  different  settings  of  £and  ©n  represents 
the  performance  indices  of  the  response  that  are  obtained  from  the  feature 
extractor. 


Once  the  training  data  is  available,  the  network  is  configured  (see  appendix  for  details  on 
the  network  architecture)  such  that  the  number  of  nodes  in  the  feature  representation  field 
F  i ,  corresponds  to  the  dimension  of  the  input  feature  vector,  which  in  the  present 
application  is  fixed  at  six.  The  number  of  processing  nodes  in  the  category  representation 
field  F2,  is  generally  greater  than  total  number  of  input  patterns  in  the  prototype  set.  Each 
pattern  in  the  prototype  set  is  sequentially  presented  to  the  network  once.  A  second  cyclic 
presentation  of  the  prototype  set  may  be  made  for  a  stable  category  confirmation. 
Subsequent  presentations  do  not  alter  the  resulting  category  structure.  The  category 
structure  represents  the  state  space  partitioning  of  the  neural  network  depending  on  the 
number  of  stable  categories  established  during  training  and  the  different  feature  vectors  that 
were  classifed  into  these  categories.  During  training,  the  attentional  vigilance  parameter  is 
set  at  its  highest  value  (0.99)  to  ensure  a  high  resolution  of  the  resulting  category  structure. 
The  network  associates  with  each  new  category  established  the  system  parameters  C  and  ©n 
that  were  responsible  for  the  generation  of  the  corresponding  feature  fector. 


When  the  network  is  presented  with  a  feature  vector  for  the  first  time,  it  is 
encoded  in  the  LTM  through  modification  of  the  LTM  connection  weights.  A  node  is 
allocated  in  the  networks  category  representation  field  F2  to  represent  the  pattern.  The 
parameters  associated  with  the  feature  vector  now  get  assigned  to  this  allocated  F2  node. 
On  presentation  of  subsequent  feature  vectors,  the  network's  orienting  subsystem  first 
determines  closeness  of  match  between  the  pattern  currently  imposed  on  the  network  and 
any  of  the  patterns  that  have  previously  been  seen.  Since  the  vigilance  parameter  is  set  high 
a  new  node  is  allocated  for  the  pattern.  However,  if  the  current  pattern  happens  to  be 
closely  matched  to  the  one  the  network  has  already  seen,  it  is  clustered  into  the  same 
category.  It  is  therefore  possible  to  partition  the  networks  state  space  such  that  each 
partition  serves  as  an  attractor  for  a  particular  type  of  response  characterized  by  its  feature 
vector.  The  vigilance  parameter  helps  to  control  the  coarseness  or  fineness  of  classification 
desired.  After  completion  of  training  the  top  down  and  the  bottom  up  connection  weights  of 
the  network  along  with  the  network  parameters  are  saved  into  the  computer  memory.  The 
above  process  is  repeated  for  different  sets  of  values  for  £  and  wn  taken  from  their 
respective  ranges.  Increments  in  the  parameteric  values  of  the  system  during  training 
depend  upon  the  trade  off  between  accuracy  of  identification  desired  and  the  size 
constraints  on  the  network  design. 

Testing  :  When  the  network  is  introduced  in  the  control  loop,  identification  proceeds 
almost  instantly.  Search  for  the  category  associated  with  the  right  parameters  is  achieved  by 
dynamically  altering  the  attentional  vigilance  parameter  until  an  "optimal  vigilance"  is 
found.  The  testing  procedure  is  depicted  through  the  flow  diagram  in  figure  6. 


INPUT 


Figure  6  The  optimal  vigilance  category  search  procedure 

4  EXPERIMENTAL  RESULTS 

In  this  section  wereport  the  various  experimental  results  obtained  from  computer 
simulations  of  the  process  describeddn  sections  2  and  3  of  the  paper.  We  have  described 
the  simulation  procedure  and  provided  an  explanation  of  the  various  results  ootained. 

The  time  response  of  the  system  was  obtained  using  a  Runge-Kutta  fourth  order 
differential  equation  solver  (RK-4).  Athough  the  simulations  carried  out  on  the  sequential 
computer  were  slow,  the  results  obtained  suggest  that  a  hardware  implementation  of  the 
system  identification  module  would  lead  to  much  faster  identification  owing  to  the  parallel 
distributed  nature  of  information  processing  within  the  neural  network.  Throughout  the 
simulations  sensor  and  plant  noise  were  assumed  negligible. 


14 


The  neural  network  employed  is  trained  off-line  on  the  features  of  the  response 
obtained  from  the  generalized  second  order  system  with  the  following  parameteric 
variations: 

0.1  £(£-1.50,  A(  =  0.10 

0.5  £f>n£  2.50,  Aon=o.50  (15) 

These  parameters  are  not  made  available  to  the  system  identifier  but  are 
estimated  through  the  neural  network  in  the  control  loop,  based  upon  the 
features  of  the  system  time  response  to  the  standard  test  signals.  The  only 
available  information  to  the  feature  extractor  module  is  the  discrete  time  instant  and  the 
value  of  the  system  response  at  these  time  instants. 

We  have  assumed  our  ideal  system  to  have  the  following  parameters : 

Desired  damping,  =  0.70 

Desired  frequency,  con*  =  1.50 
Plant  constant,  Km  =1.00 

Plant  parameter,  a  =2.10 

Plant  parameter,  p  =1.50 

Specification  of  plant  parameters  a  and  p  determines  the  desired  location  of  the  system 
closed  loop  poles  within  the  left  half  of  the  S-  plane  and  enables  in  the  computation  of  the 
desired  damping  ,  C  and  the  desired  natural  frequency,  wn*  of  the  system  response.The 
responses  of  a  system  with  the  above  set  of  parameters  to  a  unit  step  input  and  the  square 
input  are  depicted  in  figures  7  and  8  respectively.  We  call  these  responses  "  nominal 
responses  of  the  assumed  ideal  model ". 
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Figure  7  The  desired  system  response  of  the  assumed  ideal  model  to  a  unit  step  input,  shown  in  bold.  The 

damping  coefficient  of  the  response,  £  =  0.70  and  the  natural  frequency,  ton  *  1.50. 


Figure  8  The  desired  system  response  of  the  assumed  ideal  model  to  a  square  function  input,  shown 
in  bold.  The  damping  coefficient  of  the  response,  £  =  0.70  and  the  natural  frequency, 
ton  =  1.50.  Feature  oxtracuon  to  this  type  of  input  is  restricted  to  the  first  12  seconds  of  the 
input  profile  before  the  cross  over. 


The  transient  response  of  a  system  that  has  been  disturbed -by  an  instantaneous 
change  in  the  reference  input,  like  the  unit  step,  constitutes  the  most  interesting  and  most 
significant  part  of  the  dynamic  behavior  of  the  system.  In  the  present  application,  for  the 
unit  step  input,  the  information  contained  in  entire  transient  response  was  captured  for 


feature  extraction  by  making  the  time  window  variable  depending  upon  the  duration  of  the 
transient.  In  the  case  of  the  square  function  and  the  square  wave,  feature  extraction  was 
restricted  to  the  step  portion  of  the  response  (12  seconds). 

Two  time  varying  functions  are  considered  to  represent  the  variation  in  the  plant 
parameters  a  and  (3.  The  plant  parameter  Km  is  kept  constant.  Figure  9  shows  a  linear 
variation  while  a  slow  periodic  variation  is  depicted  in  figure  10.  It  should  be  noted  that 
since  parameter  estimation  takes  place  only  during  the  step  portions  of  the  input  signal  and 
that  no  restrictions  are  placed  on  the  control  signal  u(t),  it  is  possible  to  add  a  'probing' 
signal  to  the  input  at  fixed  time  intervals  to  enable  system  identification  when  the  input 
profile  is  arbitrary.  The  performance  of  the  control  scheme  to  inputs  of  this  kind  is 
currently  under  investigation. 
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Figure  9 


Linear  time  variation  nt  plant  parameters  u  and  p. 


Figure  10  Periodic  sinusoidal  time  variation  of  plant  parameters  a  and  p. 

Since  the  components  of  the  feature  vector  are  not  all  independent  and  since  no 
effort  is  made  to  determine  whether  a  particular  prototype  training  set  forms  a  complete 
basis  for  all  input  vectors  encountered  by  the  controller  in  a  particular  situation,  a  network 
sensitivity  study  was  carried  out.  The  objective  of  this  study  was  to  determine  which 
components  of  the  feature  vector  influenced  the  network  most  in  terms  of  the  number  of 
categories  established.  Figure  1 1  shows  the  sensitivity  of  an  ART-II  neural  network  to  the 
different  components  of  the  feature  vector.  Seventeen  prototype  feature  vectors  were 
presented  to  the  network  at  a  vigilance  of  0.995.  The  graphs  depict  variation  of  the  different 
performance  indices  for  the  feature  vectors  versus  the  categories  established.  The 
sensitivity  plots  indicate  that  the  network  was  not  uniformly  sensitive  to  all  components  of 
the  feature  vector,  especially  in  the  overdamped  case  (£  >  1)  when  most  responses  look 
very  much  alike.  The  network  was  found  to  be  most  sensitive  to  the  delay  time,  rise  time 
and  the  peak  time  of  the  response  as  shown  in  the  figure.  To  enhance  the  feature  vector 
uniformly  it  was  passed  through  a  nonlinear  function  given  by: 


fi(x)  =  eyx  where, y  =  0.05 


(16) 


Dday  Tima  V*.  Catagorla*  EtUbllihad  Rti*  Tim#  V*.  Catagory  Eatabllaha* 


••*•*»»  **  »**«•*•*  Mtom  olaklVM 


Sattltng  Tim*  Vs.  C*t*g*ri*s  Establish**  Rtspons*  Paak  Vs.  Catagarlas  Establish** 


0121*3471  0  I  2  I  *  3  4  7  • 


Paak  Tima  Vs.  Catagorlas  Establlshad  Ovarshoot  Vs.  Catagorlas  Establlshad 


0123*3471  0  I  2  3  t  3  4  7  S 

otUftry  •  o«t«f*rv  «rtMlUlb«4 


Figure  1 1  The  abpvc  plots  indicate  the  sensitivity  of  the  neural  network  to  changes  in 
the  various  performance  indices  of  the  system  time  response  at  a  constant 
attentionaJ  vigilance  of  0.995.  The  input  pattern  set  comprised  if  17  different 
input  patterns  (number  of  points  plotted  on  the  graphs)  which  are  presented  to 
the  network  once  cyclically.  The  abscissa  represents  the  identification  number 
of  the  category  established  while  the  ordinate  shows  the  actual  value  of  the 
performance  index  under  consideration. 
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Figures  12  through  18,  each  depict  two  plots,  (a)  The  time  response  of  the  actual  system  to 
the  unit  step  input  signal  with  arbitrary  initial  parameters  a,  (3,  and  Km,  and  (b)  the  final 
system  response  with  the  D.C.  bias  or  the  steady-state  error  compensated  through  the 
parameter  C.  The  uncompensated  final  response  with  only  the  controller  parameters  Kp  and 
Kd  adjusted  after  the  system  identification  and  pole  placement  is  completed  is  not  shown  in 
the  plots. 


Figure  12  An  overdampcd  system  response  with  arbitrary  initial  plant  parameters,  a  =  0.5, 3  =  0.5 
and  Km  =  1.00  and  initial  controller  gains  Kp  =  0.50,  Kq  =  2.10.  The  response  has  an  initial  damping 

coefficient  £  =  1.30  and  an  initial  natural  frequency  ton  =  l.OO.The  f in  al;  controller  parameters  are  those 
obtained  after  the  plant  parameters  are  estimated  by  the  neural  network  system  identifier  and  after  the 
relocation  of  the  overall  closed  loop  system  poles  by  the  poleplacement  module.  The  Final  controller  gains 
correspond  to  Kp  =  1.750  and  Kq  *  1.60  ana  the  D.C.  bias,  C=  2.250.  The  final  response  has  the  desired 
damping  coefficient  £  =  0.7  and  the  desired  natural  frequency  ton  =  1.50  corresponding  to  the  ideal  system 
response.  The  time  scale  shown  in  the  figure  is  that  of  the  final  system  response.  The  time  scale  incorporated 
in  the  simulation  is  adaptive  for  the  unit  step  input  and  depends  upon  the  duration  of  the  system  transient 
rcsponsc.Thc  sampling  frequency  elected  is  common  to  both  the  response  sampling  rate  and  the  RK-4 
differenual  equauon  solver  and  is  set  weli  above  the  Nyquist  frequency  of  the  system. 
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Figure  13  An  example  of  an  overdamped  system  response  with  arbitrary  initial  plant  parameters,  a  = 
0.5,  P  =  0.5  and  Km  =  0.5  and  imual  controller  gains  Kp  =  0.50,  Kd  =  2.10.  The  response  has  an  initial 
damping  coefficient  C  =  1.20  and  an  initial  natural  frequency  (On  =  1.50.The  final  controller  gains  are  Kp  = 
0.50  and  Kd  =  -0.900  and  the  D.C.  bias,  C  =  4.50.  The  final  response  has  the  desired  damping  coefficient  C 
=  0.7  and  the  desired  natural  frequency  ton  =  1.50  corresponding  to  the  ideal  system  response. 
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Figure  14  An  example  where  the  identification  provided  by  -he.  network  is  not  accurate  owing  to  the 
discrete  nature  of  partiuoning  provided  by  the  neural  network  identifier.  A  finer  resolution  can  be  obtained  if 
the  size  of  the  neural  network  is  increased  by  increasing  the  number  of  nodes  at  the  category  representation 
field  F2.  This  would  eliminate  the  possibility  of  incorrect  classification  by  the  network  as  demonstrated  in  this 
example.  The  arbitrary  initial  plant  parameters  are  a  =  0.5,  P  =  0.5  and  Km  =  0.25  and  initial  controller  gains 
Kp  =  0.50,  Kd  =  2.10.  The  response  has  an  initial  damping  coefficient  £  =  1.20  and  an  initial  natural 
frequency  o)n  =  1.50.The  final  controller  gains  are  Kp  =  0.50  and  Kd  =  -0.900  and  the  D.C.  bias,  C  =  4.50. 
The  final  response  has  a  damping  coefficient  £  =  0.8  and  a  natural  frequency  con  =  1.00.  The  desired 
damping  and  natural  frequency  corresponding  to  the  ideal  system  response  are  C  =  0.7  and  (On  =  1.50 . 
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Figures  15  figure  depicting  the  ability  of  the  proposed  adaptive  controller  to  modify  an  underdamped 
system  response.  The  arbitrary  initial  plant  parameters  are  a  =  0.2,  P  =  2.5  and  Km  =  1.00  and  initial 
controller  gains  Kp  =  0.30,  Kj  =  0.5.  The  response  has  an  initial  damping  coefficient  £  =  0.20  and  an  initial 
natural  frequency  con  =  2.00.  It  should  be  noted  that  parameter  identification  of  an  underdamped  response  is 
easier  when  compared  to  the  overdamped  and  critically  damped  cases  in  which  the  system  responses  look 
alike.  In  the  underdamped  case,  the  overshoot  makes  one  of  the  feature  vector  components  more  significant. 
The  final  controller  gains  are  Kp  =1.25  and  Kd  =  1.80  and  the  D.C.  bias,  C  =  2.2$.  The  final  response  has 
a  damping  coefficient  C,  =  0.7  and  a  natural  frequency  con  =  1.50  corresponding  to  the  ideal  system  response. 
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Figures  16  Example  of  a  highly  oscillatory  underdamped  system  response  with  arbitrary  initial  plant 
parameters  are  a  =  0.2,  p  =  2.0  and  Km  =  1.00  and  initial  controller  gains  Kp  =  2.00,  Kj  =  0.2.  The 
response  has  an  initial  damping  coefficient  C,  =  0.10  and  an  initial  natural  frequency  con  =  2.00.  The  final 
controller  gains  are  Kp  =  0.25  and  K<j  =  1.90  and  the  D.C.  bias,  C  =  2.25.  The  final  response  has  a  damping 
coefficient  £  =  0.7  and  a  natural  frequency  (On  =  1.50  corresponding  to  the  ideal  system  response. 
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Figure  17  System  response  to  a  square  input  funr*  ion  with  arbitrary  initial  plant  parameters  are  a  *  0.5, 
P  =  0.5  and  Km  =  0.50  and  initial  controller  gains  K  0.50,  IQ  =  2.10.  The  initial  response  of  the  system 
is  first  obtained  and  the  portion  of  the  response  corresponding  to  the  first  12  seconds  (upto  the  cross  over)  is 
selected  for  feature  extraction  and  system  identification.lt  has  an  initial  damping  coefficient  C  =  1.20  and  an 
initial  natural  frequency  con  =  1.50.  The  response  after  the  controller  gains  have  been  adjusted  by  the  pole 
placement  module  is  depicted  as  the  final  system  response.  The  final  controller  gains  are  Kp  =  0.50  and  IQ  = 
-0.90  and  the  D.C.  bias,  C  =  4.50.  The  final  response  matches  the  desired  response  given  by  the  assumed 
ideal  model  which  has  a  damping  coefficient  £  =  0.7  and  a  natural  frequency  con  =  1.50  .  Note  that  the 
parameter  IQ  is  negative  implying  negative  derivative  feedback. 
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Figure  18  An  underdamped  system  response  to  a  square  input  function  with  arbitrary  initial  plant 
parameters  are  a  =  0.2,  p  =  2.5  and  Km  =  1.00  and  initial  controller  gains  Kp  =  3.00,  K<j  =  1.00.  The  initial 
damping  coefficient  £  =  0.20  and  the  initial  natural  frequency  (On  =  2.00.  The  final  controller  gains  are  Kp  = 
1.25  and  K<j  =  2.30  and  the  D.C.  bias,  C  =  2.25.  The  final  response  has  a  damping  coefficient  £  =  0.7  and  a 
natural  frequency  ton  =  1.50  corresponding  to  the  ideal  model. 
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TIME  VARYING  PLANT  PARAMETERS 
plant  constant ,  Km  :  1 .00 
a0  :  nominal  valu*  (1 .50) 

„  .  nominal  valu*  (2.10) 


ot(t)  =  oc0 ( 1  +r|  sin  («t)) 
(3(t)  =  (3o  (I  +T)sin  (»t)> 

oc  ( t )  :  plant  param*t#r 
p  :  plant  parameter 


f  =  r  6.366  E  -4 

21t 

r|  :  1.00 

a)  :  0.004 


Figure  20  Response  of  the  system  to  a  square  wave  with  a  sinusoidal  variation  of  plant  parameters  as 
depicted  in  Figure  11.  Since  there  is  no  probing  signal,  parameter  estimation  takes  place  discretely  at  a  time 
intervals  of  24  seconds.  Here  again  feature  extraction  is  restricted  to  the  step  portion  of  every  pulse.  The 
response  shown  in  the  first  24  seconds  is  the  unregulated  response  of  the  system  during  which  no  feature 
extraction  or  system  identification  lakes  place.  The  rest  of  the  response  profile  corresponds  to  the  desired 

■  ~  C.  bias  compensated  every  24  seconds.  The  change 

fact  that  variation  in  parameter  is  small  with  respect 
to  the  resolution  of  the  parametric' space  of  the  neural  network. 


Wsmsm 
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TIME  VARYING  PLANT  PARAMETERS 
plant  constant ,  Km  :  1 .00 
a0  :  nominal  valut  (1 .50) 

•  nominal  Yalu*  (2.10) 


oc(t)  =  OL0(1+r)t) 
p(t)  =  2.10  constant 

ot  ( t )  :  plant  parameter 
P  (t)  :  plant  parameter 


Figure  21  System  response  to  linear  time  variation  in  one  of  the  parameters,  a  while  Km  and  P  are  held 
constant  It  should  be  noted  that  the  response  should  settle  to  zero  before  the  next  pulse  is  applied  to  ensure 
proper  identification. 


5  CONCLUSION  AND  FUTURE  WORK 


A  new  approach  for  dynamic  system  identification  has  been  attempted  that  involves 
the  application  of  a  neural  network  architecture  based  on  the  Adaptive  Resonance  Theory.  It 
has  been  shown  that  it  is  possible  to  train  the  ART  neural  net  offline  (supervised  learning 
mode)  to  identify  the  parameters  of  a  simple  second  order  system  based  upon  attributes  of 
its  response  to  standard  test  signals.  A  simple  indirect  adaptive  control  scheme  was 
formulated,  implemented  and  tested  to  assess  the  performance  of  the  network  that  was 
incorporated  as  the  online  system  identifier  in  the  control  loop.  Experimental  results 
suggest  that  identification  provided  by  the  network  is  accurate  within  the  resolution  of  the 
training  data.  The  off  line  training  of  the  network  was  found  to  be  fast  with  the  training 
experimental  data  set  being  presented  to  the  network  only  once.  During  testing  when  the 
network  is  included  in  the  control  loop,  the  search  procedure  adopted  helps  to  improve  the 
identification  provided  by  the  network.  The  application  of  the  proposed  scheme  based  upon 
the  ART  architecture  has  to  be  studied  further  by  applying  it  to  other  higher  order  systems. 
Future  work  also  includes  a  critical  comparison  between  the  identification  techniques 
adopted  in  this  work  and  the  other  conventional  methods  such  as  the  method  of  Least 
Squares  on  the  same  set  of  problems.  Such  an  exercise  would  bring  into  sharper  focus  the 
relative  merits  and  demerits  of  the  proposed  neural  network  based  identifier.  It  would  also 
be  useful  to  study  the  possibility  ofcimplementing  the  network  such  that  it  is  made  capable 
of  learning  online.  Future  work  also  includes  an  evaluation  of  the  proposed  scheme  to  input 
signal  profiles  that  are  not  regular  via  overriding  probing  signals. 
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APPENDICES 

The  ART-II  Mathematical  model : 


The  equations  presented  in  this  section  are  taken  from  reference  [Grossberg  & 
Carpenter,  1987(a)].  For  the  convinience  of  the  reader  we  present  the  equations  used  in  the 
our  simulations  in  the  form  of  the  ART-II  mathematical  model.  Figure  22  depicts  the  ART- 
II  network  architecture  while  figure  23  the  network  topology  used  in  the  simulations. 

The  global  network  parameters  and  constants  are: 

M :  number  of  Fi  input  channels 
N  :  number  of  STM  nodes  at  stage  F2 

a :  network  parameter  (a  =  10.00) 
b :  network  parameter  (b  =  10.00) 
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c :  network  parameter  (c  =  0.10) 
d  :  network  parameter  (d  =  0.9  (0<d  <l)) 
e :  network  parameter  (e  =  0.0) 

0 :  noise  tolerance  parameter  (0=1/  VM) 
ETP:  error  tolerance  parameter  ( ETP  =  0.05) 
p:  attentional  vigilance  parameter  (0£p£l) 


F2STM 

LTM 


FjSTH 


Network  Equations: 

Refer  to  ART-II  network  topology  in  figure  23.  The  local  STM  activities  pi,  qi,  ui,  wi  and 
xi  computed  at  the  STM  stage  F\  (AH  equations  expressed  in  dimensionless  form)  are  : 

For  i  =  1 . ,M  and  j  =  M  +  l . N: 


P,=  Ui+  ?:g  lyj-)Zji 

J  J 

n  • 

(17) 

r.i 

e  +  j> 

(18) 

Uj=  vi 

1  e  +  iVi 

(19) 

vj=  f(xj)  +  bf(q|) 

(20) 

w|  =  I  j  +  auj 

(21) 

Wj 

X.  =  - — 

1  e  +  iwi 

(22) 
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|  V|  represents  the  L2  norm  of  the  vector  V. 

f(x)=  ifosxse 

(x2+e2> 

=  x  ifxS8 
where,  f(x)  is  the  thresholding  function. 

The  local  STM  activity,  Tj  computed  at  the  STM  stage  F2  is  given  by: 

Tj=?PiZij 

where  Zij  are  the  bottom  up  connection  weights  (Fi  ->  F2). 


(23) 


(24) 


The  LTM  Equations : 

d  (Z«) 

Top  Down  (F2  ->  FI) :  — ^ —  =  gtyj )  [Pf  Zjd 

Bottom  Up  (F!  ->F2) :  ISlL  =  g(y . )  [p..  Z|j] 
where, 

Zji :  Top  down  connection  weights  (F2  ->  Fi) 

Zij :  Bottom  up  connection  weights  (Fi  F2) 

g(yj )  =  d  { if  Tj  =  Max  Tj:  the  F2  node  that  has  not  been  reset  on  current  trial) 
0  otherwise 


(25) 

(26) 

(27) 


when  F2  makes  a  choice  and  the  J1*1 1T2  node  is  selected  during  contrast  enhancement 
process  (winner  take  all  or  on  center  off  surround  selection  procedure),  equations  25  and 
26  can  be  modified  as: 


(28) 

(29) 


for  j  *  J  : 


(30) 
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forj  * J : 


d(Zij) 

dt 


=  0 


and 


ISiL-o 


The  orienting  subsystem  consisting  of  the  reset  mechanism  resets  F2  whenever  the  input 


pattern  is  active  and 
where, 


-P->1 

e  +  in  (3D 

__Ui+CPL_ 

ri~e  +  iwi  +  jcpi  (32) 


r :  the  attentional  vigilance  parameter 

The  network  initialization  involves  setting  all  the  network  parameters  at  typical  values 
indicated  in  the  section  on  network  equations.  The  top-down  connection  weights  are  all 
initially  set  equal  to  0.  The  bottom-up  connection  weights  are  initialized  at: 


Zli~  2  (1-  d)  -/M 


(33) 


ART-11  Learning  algorithm: 

The  ART-II  learning  algorithm  used  in  all  the  simulations  that  were  carried  out 
is  presented  in  the  form  of  a  simulation  flow  diagram  shown  in  figure  24.  The  flow 
diagram  is  self  explanatory  and  represents  a  complete  cycle  of  events  within  the  network 
during  a  single  iteration.  The  major  features  of  the  algorithm  are  the  STM  stabilization  loop 
and  the  search  loop.  The  STM  stabilization  loop  ensures  that  all  STM  activities  at 
stabilize  (no  longer  changes)  before  a  bottom-up  input  pattern  can  be  transferred  from  Fj  to 
F2  through  the  LTM.  The  search  loop  comprises  of  the  bottom-up  and  top-down 
processing  mechanisms  and  the  vigilance  test  that  determines  the  appropriate  recognition 
category  corresponding  to  an  input  pattern. 
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Time  Optimal  Robot  Navigation  via  the 

Slack  Set  Method 

STEVEN  C.  ZAHARAK1S  and  ALLON  GUEZ 


Abstract — The  minimum-time  obstacle  avoidance  trajectory  fora  robot 
has  been  a  long  standing  problem  of  considerable  interest.  The  slack  set 
technique,  an  algorithm  for  determining  a  minimum-time  obstacle 
avoidance  trajectory  for  a  robot  in  a  known  environment,  is  presented. 
For  time  optimal  trajectories  with  constrained  acceleration  and  velocity, 
the  shortest  time  of  motion  may  be  different  for  each  joint  or  axis  of  the 
system.  Thus  some  delay  of  a  joint  other  than  the  slowest  will  no! 
necessarily  affect  the  time  of  motion  for  the  entire  system.  This  natural 
redundancy  for  obstacle  avoidance  in  order  to  simplify  the  trajectory 
search  algorithm  by  at  least  one  order  of  magnitude  (one  DOF  less)  is 
exploited.  By  neglecting  the  presence  of  all  obstacles  and  assigning  to 
each  actuator  maximum  control  (bang-bang),  a  lower-bound  estimate  of 
the  time  needed  to  complete  a  task  (Ttask)  is  calculated.  The  A* 
heuristic  search  is  used  to  search  what  we  call  the  slack  set;  a  subset  of 
the  stale  space  that  contains  only  those  states  that  are  members  of  a 
trajectory  with  a  task  time  equal  to  T  task.  If  no  trajectory  is  found 
during  the  initial  search,  the  subset  of  the  state  space  being  examined  is 
sequentially  increased  until  a  valid  trajectory  is  found.  The  slack  set 
technique  is  guaranteed  to  find  a  feasible  monotonic  trajectory  if  such  a 
trajectory  exists  in  the  slack  set.  Since,  in  general,  the  minimum-time 
obstacle  avoidance  trajectory  is  not  unique,  secondary  constraints  such 
as  minimum  distance,  minimum  distance  in  the  state  space,  and  others 
can  also  be  satisfied.  The  method  is  demonstrated  via  a  planar  mobile 
robot. 

I.  Introduction 

ROBOTS  have  been  applied  in  manufacturing  for 
many  years.  Today  robotics  has  found  applications 
in  such  diverse  fields  as  medicine  and  space  exploration. 
The  most  important  capability  of  a  robot  is  its  ability  to 
move  in  and  modify  its  environment.  After  almost  three 
decades  of  worldwide  research  m  the  area  of  robot  mo¬ 
tion  control,  the  state  of  art  offers  rather  simplistic  con¬ 
trol  and  motion  planning  methods  which  are  employed  m 
the  most  advanced  systems  of  industrial  robots.  The  cur¬ 
rent  state  of  knowledge  indicates  that  the  problem  of  time 
optimal  obstacle  avoidance  robot  trajectory  derivation  is 
still  unsolved.  Dynamic  constraints  imposed  by  the  robot's 
mechanical  structure,  and  actuators  have  yet  to  be  incor¬ 
porated  in  the  process  of  trajectory  planning.  The  impor¬ 
tance  of  inclusion  of  robot  dynamics  in  the  process  of 
path  planning  has  been  recognized  by  many  [1],  [10],  [28]. 
No  practical  technique  has  been  developed  due  to  the 
complexity  of  the  Droblem,  the  severity  of  the  computa- 
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tional  requirements  and  the  current  approach  to  the 
problem.  Motion  planning  and  execution  is  heuristically 
decoupled  mainly  into  two  independently  solved  subprob¬ 
lems:  path  planning  and  path  following.  This  artificial 
separation  of  the  control  and  planning  tasks  degrades  the 
overall  performance  of  the  system.  It  is  expected  with  the 
continuous  enhancement  of  speed  and  power  per  unit 
cost  of  processors,  the  advent  of  new  computers,  architec¬ 
tures  and  computational  algorithms,  that  a  global  simulta¬ 
neous  solution  of  some  of  the  mentioned  subproblems 
may  become  feasible. 

The  main  objective  of  this  work  is  to  develop  a  unified 
strategy  of  path  planning  and  control.  Our  approach  to 
time-optimal  obstacle  avoidance  trajectory  planning  uni¬ 
fies  several  problems  which  are  usually  considered  sepa¬ 
rately:  path  planning,  minimum  time  control,  real-time 
controller  structure,  and  obstacle  avoidance.  We  believe 
that  such  unification  gives  a  number  of  theoretical  bene¬ 
fits  and  helps  in  solving  problems  proven  to  be  difficult 
when  approached  separately.  The  objective  is  to  develop 
a  real-time  robot  motion  controller  that  will  simultane¬ 
ously  generate,  based  on  static  and  dynamic  constraints 
(including  parameters  of  the  system,  location  of  obstacles 
and  task  constraints),  the  time  optimal  obstacle  avoidance 
control  and  trajectory.  Our  approach  is  based  on  the 
observation  that  optimal  path  planning  cannot  be  achieved 
without  accounting  for  robot  dynamics,  especially  in  an 
obstacle  strewn  environment  [1],  [7],  This  dynamic  path 
planning  and  execution  is  feasible  by  exploiting  modern 
control  methods  such  as  global  linearization  and  decou¬ 
pling  via  state  feedback  and  the  Maximum  Principle  [5], 
[7],  This  approach  allows  us  to  treat  the  position  con¬ 
straints  as  a  part  of  the  overall  dynamic  constraints  on  the 
state  variables  of  the  system.  Other  benefits  of  this  ap¬ 
proach  are  the  ability  to  derive  time  optimal  obstacle 
avoidance  trajectories  that  also  display  secondary  opti¬ 
mizations  by  exploiting  the  fact  that  the  time  optimal 
trajectory  for  robot  manipulators  satisfying  inequality  con¬ 
straints  on  the  jerks,  accelerations  and  velocities  of  the 
joints  is  not  unique  [29].  This  “redundancy"  in  the  selec¬ 
tion  of  a  time  optimal  trajectory  creates  a  “slack  set” 
which  will  be  used  in  selecting  the  trajectory  that  is 
optimal  with  respect  to  some  prespecified  criterion.  The 
following  paragraphs  describe  results  related  to  our  work. 

The  area  of  path  planning  among  obstacles  has  been 
studied  mainly  during  the  last  ten  years.  Lozano-Perez 
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and  Wesley  [20]  proposed  a  method  for  finding  the  mini¬ 
mum  distance  path  through  a  polygonal  obstacle  infested 
workspace.  A  visibility  graph  (VGraph)  consisting  of  ver¬ 
tices  of  the  polygonal  obstacles,  the  initial  position,  and 
the  goal  position  are  used  [18].  Two  vertices  are  con¬ 
nected  with  edges  when  the  connecting  edge  does  not 
intersect  any  obstacle.  A  cost  is  assigned  to  each  edge 
according  to  its  length.  A  searching  algorithm  is  then 
employed  to  determine  the  set  of  these  edges,  which, 
when  combined,  lead  us  from  the  initial  position  to  the 
goal  position  with  the  minimal  total  cost  (minimum  dis¬ 
tance).  This  technique  will  be  used  as  a  comparison  to  the 
slack  set  technique  introduced  in  this  paper. 

Pieper  and  Widdoes  [27]  used  planes,  cylinders,  and 
spheres  to  represent  obstacles.  This  approach  has  the 
advantage  of  eliminating  the  orientation  problem,  but 
introduced  the  problem  of  path  elimination  due  to  obsta¬ 
cle  modification.  The  enlargement  of  the  obstacle  in  some 
cases  eliminates  feasible  paths.  Additionally,  the  intersec¬ 
tion-  functions  are  often  nonlinear  and  involve  square 
roots  or  transcendental  furctions.  Udupa  [27],  Lozano- 
Perez  and  Wesley  [19]  and  Brooks  [4]  adopted  the  polyhe- 
dra  as  the  models  that  result  in  linear  intersection  func¬ 
tions,  but  the  orientation  problem  must  be  handled  with 
care.  Udupa  discretized  the  space  into  sectroids  and 
passes  that  were  labeled  as  free  if  not  occupied  by  obsta¬ 
cles  and  objects.  Lists  of  free  passes  are  joined  together  to 
form  a  collision-free  path.  To  allow  for  arbitrary  orienta¬ 
tion,  obstacles  are  enlarged.  This  in  turn,  reduces  the 
number  and  size  of  the  free  passes.  Lozano-Perez  de¬ 
scribed  linked  polyhedra  using  swept  volumes.  Free  space 
is  then  represented  as  overlapping  generalized  cones.  In 
these  methods,  some  determine  the  free  space  inside 
which  the  point  robot  may  move  freely  without  collision, 
while  others  determine  the  forbidden  region  so  that  a 
collision-free  path  may  be  traced  along  the  boundaries  of 
the  region.  Luh  [21]  modifies  the  environment  by  inclu¬ 
sion  of  pseudo-obstacles  that  are  generated  by  real  obsta¬ 
cles’  edges  and  faces.  This  process  allows  the  robot  itself 
to  be  represented  by  a  point  specifying  the  robot’s  tip 
location  in  space. 

In  order  to  derive  the  control  required  to  traverse  the 
derived  path,  a  trajectory  must  be  found  for  the  path. 
This  is  generally  accomplished  by  searching  the  state 
space  for  the  velocity  profiles  that  will  allow  for  the 
traversability  of  the  given  path  [2],  [26],  [27].  Using  this 
technique,  one  can  derive  the  velocity  profile  that  allows 
for  the  minimum  time  traversal  of  a  path.  Note  that 
traversing  a  minimum  distance  path  in  minimum  time 
does  not,  in  general,  produce  a  minimum  time  trajectory. 
This  subject  will  be  discussed  in  detail  later  in  this  paper. 
None  of  these  works,  however,  accounts  for  manipulator 
dynamics  in  the  planning  process.  As  a  result,  the 
traversability  of  such  paths  is  questionable  due  to  the 
dynamic  and  mechanical  constraints  of  the  manipulator. 

Paul  et  al.  [23]  present  a  linear  programming  algorithm 
for  finding  the  near  minimum  time  path  of  the  manipula¬ 


tor  end  effector.  It  deals  with  path  planning  by  designing 
a  set  of  time  intetvals  for  the  transition  among  a  sequence 
of  preassigned  points  in  the  Cartesian  space.' The  motion 
between  each  pair  of  points  is  assumed  along  a  straight 
lien  segment  with  constant  velocity.  The  total  traveling 
time  is  minimized  while  observing  velocity  and  accelera¬ 
tion  inequality  constraints.  The  manipulator  dynamics  are 
not  included  since  the  path  is  assigned  a  priori.  This 
technique  cannot,  in  general,  produce  the  time  optimal 
traversal  of  a  given  path  due  to  the  fact  that  it  deals  with 
local  and  not  global  optimization.  Later  this  method  was 
extended  by  Lin  et  al.  to  fitting  of  cubic  polynomials  with 
velocity,  acceleration  and  jerk  constraints  in  planning  a 
minimum  time  trajectory  for  a  given  sequence  of  points. 
Lynch  [17]  derives  a  sequential  mode  (one  axis  at  a  time) 
minimum  time  for  two  axis  manipulator.  In  a  sequential 
mode,  coupling  between  the  various  axes  is  significantly 
reduced;  the  execution  time,  however,  is  much  larger  than 
in  the  one  obtained  in  simultaneous  motion  of  all  axes. 

The  following  techniques  incorporate  some  dynamics  of 
the  robot  during  their  planning.  The  idea  of  artificial 
potentials  [  13],  [15],  [22]  is  to  allow  to  move  about  space 
while  under  the  influence  of  the  forces  of  attraction  and 
repulsion.  The  attractive  force  would  stem  from  the  goal 
point,  while  all  obstacles  would  exert  repulsive  forces  on 
the  robot.  All  forces  would  be  directly  proportional  to  the 
velocity  by  which  the  source  is  approached  and  inversely 
proportional  to  the  distance  from  it,  thus  partial  inclusion 
of  robot  dynamics  is  achieved.  Loeff  and  Soni  added  to 
the  end  effector  of  a  manipulator  the  velocity  towards  the 
destination  and  gave  the  joints  repulsive  velocities  from 
the  obstacles.  Similarly,  Khatib  and  Le  Maitre  [13]  used 
an  attractive  force  on,the  end  effector  attracting  it  to  its 
desired  position  and  orientation.  Okutomi  and  Mori  [22] 
took  a  more  elaborate  approach  and  used  artificial  poten¬ 
tials  in  the  joint  coordinate  space.  Repulsive  forces  were 
exerted  from  all  forbidden  regions  in  this  space. 

The  minimum-time  path  for  a  robot  has  been  a  long¬ 
standing  and  unsolved'  problem  of  considerable  interest. 
For  problems  of  nontrivial  dynamics,  the  derivation  of  a 
time-optimal  solution  cannot  be  guaranteed.  This  is  true 
primarily  due  to»the  unavailability  of  an  analytical  solu¬ 
tion  and  the  limitations  of  today's  computers  which  do 
not  allow  for  the  verification  of  a  nontrivial  "time-optimal 
obstacle  avoidance  trajectory."  Though  the  optimal  con¬ 
trol  theory  of  dynamic  systems  is  well  established,  it  is 
rarely  used  in  practice  due  to  the  highly  nonlinear  and 
highly  coupled  form  of  the  differential  equations  that 
govern  the  system.  The  need  to  incorporate  heuristic 
knowledge  about  plausible  solutions,  along  with  the  abil¬ 
ity  to  make  tradeoffs  concerning  the  optimality  of -the 
solution,  as  well  as  the  computational  cost  of  the  deriva¬ 
tion.  have  been  outlined  in  [7],  [8]. 

One  category  of  trajectory  planning  techniques  uses  a 
bang-coast-bang  approach.  One  of  the  first  attempts  of 
time-optimal  trajectory;  derivation  was  made  by  Kahn  and 
Roth  [12],  His  technique  assumed  all  trajectories  to  follow 
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a  bang-bang  trajectory  with  multiple  switching  times.  Luh 
and  Lin  [20]  chose  to  subdivide  the  problem  into  the 
derivation  of  time  optimal  trajectories  connecting  various 
points  along  a  predefined  path.  Position,  velocity  and 
acceleration  constraints  of  the  weakest  actuator  are  im¬ 
posed  on  the  whole  system,  thus  severely  degrading  per¬ 
formance.  Komhauser  and  Brown  [14]  developed  a  tech¬ 
nique  based  on  state  space  tessellation  and  a  graph  search. 
In  this  technique  they  assume  a  bang-coast-bang  solution 
with  fixed  switching  points.  Successors  of  a  state  are 
generated  by  using  nine  torque  patterns.  Through  the  use 
of  a  heuristic  search  technique  the  fastest  trajectory  is 
found. 

The  time-scaling  technique  is  used  in  the  derivation  of 
a  trajectory  which  will  traverse  a  given  path  in  minimum 
time  [1].  The  equations  of  motion  are  expressed  in  terms 
of  the  distance  along  the  predetermined  cartesian  path.  A 
finite  distance-velocity  state  space  is  derived  using  the 
torques/forces  specifications  of  the  motors.  A  set  of 
switching  points  is  then  found  that  moves  the  arm  as  close 
as  possible  to  the  limits  of  the  state  space.  The  optimality 
of  this  algorithm  was  proven  by  Bobrow  in  his  Ph.D. 
paper.  He  found  that  one  actuator  is  always  saturated, 
and  that  the  others  adjust  their  torques.  Other  time-scal¬ 
ing  implementations  [11],  [17]  use  polynomial  splines  for 
the  initial  path  definition.  Iterative  modification  of  these 
paths  is  attempted  until  convergence  of  a  local  minima  is 
achieved. 


II.  Problem  Statement 


Given  the  dynamic  model  of  the  N  degree  of  freedom 
(NDOF)  robot  with  its  N  DC  Servo  actuators  driving 
joints: 

*l=*2 

X2  =  g{X,t) 

Xy=  F)X2  +  F2  Xy+  L  ~  'V  (1) 

where 


*1  ~  :(*II'^I2>  XiN) 

is  the  joint  displacements  vector 

Xz  =  ( X2\,  X22,  •  •  •  X2N)t  is  the  joint  velocities  vector 

Xy  —  ( Xyi ,  Xji  > ' '  ‘  X^f 

is  the  rotor  currents  of  the;actuators  vector 

is  the  vector  of  applied  input  voltages. 

Given  the  initial  configuration 

*('«)  =  (2) 

the  desirable  final  configuration 

X(t/)  =  (X\(tf),  X2(tf),  X2(tj r))  (3) 


the  inequality  constraints 

<  <om. 


<h  (4) 

where  a>m, ,a„,A(  are  the  maximum  allowed  velocity, 
acceleration  and  jerk  respectively  for  the  /th  joint. 

The  sets  of  forbidden  regions  are  assumed  to  be  given 
in  the  joint  space  (Af,)  as  time  varying  intervals  which  may 
or  may  not  be  connected.  These  are  obtained  from  the 
kinematic  translation  of  the  obstacle’s  configuration  in 
the  workspace,  thus  Given  also  the  sets  of  forbidden 
regions  (obstacle)  for  each  joint  Sf(f) 

where  Sf(r)  e  R ,  for  all  t  e(r0,r/),  i«  1,2,3,-  •  -,N 

and  the  joints  hard  limits 

®/min  ^  X/j  <  Q( mu .  (5) 

From  among  all  the  state  trajectories  that  satisfy  (1) 
through  (5),  (i.e.,  feasible  trajectories),  and  minimize  the 
travel  time  (6) 


1*2,1 

dx2j 

~dt 

d2x2i 

nr 


tf-to-  f'ldt  (6) 

-h 

find  the  trajectory  X*U)  that  maximizes/minimizes  the 
secondaryfunctional 

J(X(t),V(t)).  (7) 

The  problem  stated  previously  is  relevant  for  both 
industrial  and  mobile  robots.  In  the  case  of  point  mass 
mobile  robot  models,  however,  coupling  between  the  vari¬ 
ous  DOFs  occurs  through  the  state  inequality  constraints 
and  obstacles,  rather  than  in  the  robot  dynamics. 

This  paper  will  describe  an  algorithm  for  finding  a  time 
optimal  or  near  time  optimal  obstacle  avoidance  trajec¬ 
tory  for  a  robot  of  known  dynamics,  if  such  a  monotonic 
trajectory  exists.  For  reasons  of  simplicity,  the  algorithm 
will  be  illustrated  by  applying  it  to  an  autonomous  vehicle 
with  motion  being  constrained  to  the  two-dimensional 
(2-D)  XY  plane,  as  follows.  However,  the  algorithm  can 
be  generalized  to  a  robot  with  an  arbitrary  number  of 
degrees  of  freedom. 


III.  Slack  Set  Method 

The  slack  set  method  is  outlined  in  this  section.  The 
method  introduces  a  new  concept  for  severely  reducing 
the  search  space,  thus-minimizing  the  computational  cost 
of  deriving  such  a  trajectory.  Details  of  the  technique 
follow,  as  do  explanations  of  relevant  terms.  The  algo¬ 
rithm  attempts  to  derive  a  time  optimal  obstacle  avoid¬ 
ance  trajectory  for  a  robot  of  known  dynamic  capability. 
Through-out  this  paper  it  is  assumed  that  the  robot  and 
the  robot’s  environment  are  completely  known.  As  a 
result,  the  maximal  distance,  velocity  and  acceleration  of 
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Fig.  I.  Flowchart  of  the  slack  set  technique. 


the  vehicle  is  known.  Position,  size  and  shape  of  ail 
obstacles  are  also  considered  known. 

For  reasons  of  simplicity  the  slack  set  method  will  be 
applied  to  an  autonomous  vehicle  with  motion  being 
constrained  to  the  AY  plane,  with  bounded  distance, 
velocity  and  acceleration.  However,  the  algorithm  can  be 
generalized  to  a  robot  with  an  arbitrary  number  of  de¬ 
grees  of  freedom.  All  obstacles  are  rectangular  with  ran¬ 
dom  size  and  position  in  the  workspace.  When  overlap¬ 
ping,  they  produce  obstacles  of  random  size  and  shape. 
All  obstacles  are  considered  static.  The  slack  set  tech¬ 
nique  is  outlined  by  the  flowchart  of  Fig.  1. 

Decoupling  the  System 

Let  the  dynamic  model  of  a  planar  point  mass  robot  in 
a  two-dimensional  plane  be 

X 

vx  -/(*,i',Kt>r,,7,.jy)  (8) 

HI 


where  X,Y  denote  the  robot’s  position,  K*K  denote 
robot  velocity,  and  T,,Ty  denote  torque  along  the  X  and 
Y  axis  correspondingly. 

In  general,  (8)  is  a  coupled  nonlinear  four-dimensional 
(2-D)  differential  equation  describing  the  coupling  and 
nonlinearities  of  the  robot.  It  is  well  known  that  when  the 
robot  dynamics  are  known,  it  is  possible  to  employ  the 
Feedback  Linearizing  and  Decoupling  Transformation  to 
achieve  global  decoupling  of  an  N- DOF  system  into  N 
linear  subsystems  without  disregarding  robot  dynamics.  [6] 
As  a  result,  we  can  assume  that  the  motion  along  the  axes 
has  independent  dynamics,  however  cross  axis  coupling 
still  exist  through  the  presence  of  obstacles  as  will  be 
described  below. 

In  the  case  of  our  2DOF  system,  let  YxmM,Yymm, 
a.rnux>aym«  be  the  maximum  velocity  and  acceleration 
along  the  X  and  Y  axis  respectively.  We  assume  total 
knowledge  of  the  static  environment,  position  and  size  of 
all  obstacles.  The  robot’s  state  is  described  by 
S(XlY,Vx,Vy).  Given  an  initial  position  (x0,y0)  and  a 
goal  position  (xK,yx),  we  are  to  derive  a  trajectory  which 
would  take  the  robot  from  rest  at  (xu,y„)  to  rest  at 
ixx,.yg)  while  avoiding  all  obstacles  and  do  so  in  the  least 
amount  of  time. 

Applying  FLDT,  the  dynamics  (8)  can  be  expressed  as 
the  equations  for  two  double  integral  plants: 

X  =  K 

Kmax 

K  =  a,.  (9) 

where  the  admissible  set  (3)  is: 

|fl(v|  ^  |ffV|^QfymaX  Vl0) 

the  inequality  constraints  (4)  are: 

IK-I  <  Kmax  IK-1  <  Yymax  00 

The  cost  functional  which  we  choose  to  minimize  is  the 
time  needed  to  complete  the  task  (task  time),  of  taking 
the  robot  from  its  rest  at  its  initial  state  S,„  to  rest  at  its 
goal  state  Sx  while  avoiding  all  obstacles.  It  is  assumed 
that  the  task  is  started  at  time  t„  =  0  thus  (2)  becomes 

G-tf-t0-tt.  (12) 

Static  obstacles  can  be  expressed  as  functions  of  position 
only 

0,(X,Y )  =  (A1I  ( x,y )  points  within  the  tth 

obstacle  region}  ( 13) 

where  #-=*"1,2,  ■  •  •  d,  d  being  the  number  of  obstacles  in 
the  robots  workspace. 

The  goal  state  expresses  the  position  and  velocity  which 
the  robot  is  to  achieve  by  the  end  of  the  task  time  UK).  If 
the  goal  velocities  are  both  equal  to  zero,  then  the  goal 
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state  is 

SK  =  (A'g.y^.O.O).  (14) 

Using  the  previous  definitions  the  problem  can  be  stated 
as  follows. 

Find  the  input  a*(r),  t  e[0,  tf],  which  globally  mini¬ 
mizes  the  cost  functional  (12)  while  satisfying  (10),  and 
for  which  the  resulting  state  trajectory  S*(f )  satisfies  (9) 
and  (11),  avoiding  the  union  of  all  obstacles  (13)  and 
reaching  Sx  at  time  tx. 

Before  any  computer  algorithm  can  be  implemented, 
we  must  first  discretize  time.  For  simplicity  and  without 
loss  of  generality,  let  the  fixed  sampling  rate  AT  =  1. 
After  appropriate  scaling,  the  equations  of  motion  of  the 
robot  will  be 

X{k  +  \)  =  X{k)  +  Vx(k) 

Y(k  +  \)  =  Y{k)  +  Vy(k) 

K(k  +  \)  =  K(k)  +  a,(k) 

K(*  +  i)-K(*)  +  «,(*)  05) 

where  k  =  0  •  •  •  kv  and  kK  is  such  that  t v  =  r„  +  kv  AT. 

The  problem  to  be  solved  is  therefore  to  find  the 
sequence  (ax(k),al(k)),  where  k  =  0  ■  ■  •  kr  which  takes 
the  robot  from  rest  at  position  (A '„.YJ  at  k=0  and 
moves  it  to  position  (.rv.yt,)  at  k  =  kK.  while  avoiding  all 
obstacles  for  the  minimum  value  of  A\..  satisfying  all  state 
and  input  inequality  constraints.  Coupling  still  exists 
through  the  presence  of  obstacles. 

Critical  Axis  Handling 

Depicted  in  Fig.  2  is  an  example  of  a  time  optimal 
trajectory  planning  problem  in  the  absence  of  obstacles. 
Given  that  Ktmax  =  Vym,x  and  «rmax  =  a,  max  =  amM,  we 
wish  to  reach  state  Sg  from  state  So  in  minimum  time.  By 
neglecting  the  presence  of  all  obstacles  and  assigning  to 
each  actuator  maximum  control  (bang-bang)  a  lower 
bound  estimate  of  the  time  needed  to  complete  a  task  (T 
task)  can  be  calculated.  When  the  maximum  principle  is 
applied  to  the  problem  described  in  Fig.  2  the  time 
optimal  trajectories  of  Fig.  3  are  obtained  [6],  [9]. 

Trajectories  displaying  a  monotonically  decreasing  dis¬ 
tance  to  the  goal,  along  at  least  one  of  the  axis  are 
denoted  as  monotonic.  Let  us  assume  here  that,  Trlask> 


Fig.  3.  Bang-bang  profiles. 


Tytask  (Fig.  3).  The  task  is  complete  when  subtasks  along 
both  axis  are  complete,  thus  Tla^k  =  max(Trtask,Tyla5k). 
The  axis  with  the  maximal  task  time  is  called  the  critical 
actuator  (CA)  while  all  others  are  called  slack  actuators 
(SA)  (Fig.  3).  Inputs  of  the  slack  actuator  (Y  axis  here) 
can  be  modified  as  long  as  at  time  Trlask  we  have  y  =  yg 
and  Vy  =  0;  all  these  trajectories  have  1)  ^task  ~ 
max(7tUik,ryUlvk);  and  2)  T,  la,k  <  Trtask. 

We  name  the  set  containing  all  such  trajectories  the 
slack  set.  Note  that  the  slack  set  contains  a  set  of  mini¬ 
mum-time  trajectories  of  a  specified  task. 

This  multiplicity  of  trajectories  motivates  us  for  a  Closer 
examination  of  the  slack  set  (Fig.  4).  If  valid  trajectories 
exist  in  the  presence  of  obstacles  (traversable  paths),  then 
all  these  trajectories  will  be  time  optimal  obstacle  avoid¬ 
ance  trajectories. 

By  neglecting  the  presence  of  all  obstacles  and  assign¬ 
ing  the  maximum  control  torque  to  each  ac'ua.or  we  can 
determine  which  actuator’s  task  time  is  greatest  (Ttask). 
This  actuator,  called  the  critical  actuator  (CA),  exerts  a 
time  constraint  on  the  completion  of  the  \.,sk.  The  CA 
retains  its  bang-bang  profile  while  a  search  will  determine 
the  profiles  of  the  other  actuators.  We  assume  here,  with 
no  loss  of  generality,  that  T,ask  =  Ty  >  Ty .  thus  we  will 
have  a  fixed  bang-bang  profile  along  the  X  axis. 
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Fig.  5.  Transforming  the  problem,  (a)  Original  task,  (b)  Control  actuator  distance  profile,  (c)  Transformed  task. 


Transformation  from  N-DOF  to  (N-D-DOF 

We  can  take  advantage  of  the  a  priori  knowledge  of  the 
CA  profile  by  transforming  the  problem  from  the  XYVxVy 
state  space  to  the  slack  state  space  YVv.  This  is  done  by 
using  the  Critical  Actuator  distance  profile  in  conjunction 
with  the  knowledge  of  all  obstacle  positions. 

Let  Oxl,Ox2  be  the  projections  of  an  obstacle’s  bound- 
ary  along  the  X  axis  and  0yl,Oy2  be  the  projections  of 
the  obstacle’s  boundary  along  the  Y  axis  (Fig.  5).  Using 
the  Critical  Actuator’s  distance  profile  and  the  distances 
Ox  1,0*2  one  can  determine  during  what  times  r0,  and 
2  the  robot  will  be  in  the  projection  of  the  obstacle 
along  the  X  axis  (Fig.  5).  If  ( xn,yn )  defines  a  position  of 
the  robot  in  the  XY  plane  then: 

{Robot  in  Obstacle)  <=>  {*  e  [Ox  1  •  ■  ■  0x2]  and 

e[Oyl--OY2]]. 

The  obstacle  is  avoided  if  during  this  time  period 
[r4l,  •  •  ■  /()2]  the  robot’s  y  coordinate  of  position  is  not  in 
the  region  [Oyl,--Oy2]  (Fig.  5).  By  transforming  all 
obstacles  from  the  XYVKVr  to  YVy  slack  state  space  we 
severely  reduce  th?  state  space  size. 

Note  that  our  definition  of  an  obstacle  is  independent 
of  velocity.  As  a  result,  all  states  with  a  positional  vector 
located  in  an  obstacle  are  illegal  and  r s  such  can  not 
belong  to  the  Slack  Set.  Thus,  the  size  of  the  slack  set  is 
inversely  proportional  to  the  area  occupied  by  obstacles. 
Secondly,  we  can  model  complex  obstacles  such  as  mud, 
by  including  velocity  in  our  definition  of  an  obstacle.  For 
example  we  may  allow  the  robot  to  move  through  a 
certain  position  f  its  velocity  exceeds  some  defined 
“escape  velocity.”  Work  has  already  begun  to  include 
moving  obstacles. 

Searching  the  Slack  Set 

In  Older  to  derive  the  profile  along  the  slack  axis  (K) 
we  search  in  a  subset  of  the  YV.  state  space  called  the 
Slack  Set.  A  state  (y,t )  belongs  in  the  Slack  Set  it  tne 
following  two  conditions  are  satisfied:  I)  iy-Vgl^ 
^ — goal  Position  is  reachable,  and  2)  |r|  < 
a*ma*(^la»k  “  t)"goal  velocity  is  attainable.  The  slack  set 
is  searched  using  an  A *  heuristic  search  [24}.  As  men¬ 
tioned  previously,  any  trajectory  found  in  the  slack  set  will 
be  a  time-optimal  trajectory.  Fo>:  this  reason  we  need  not 
choose  to  use  heuristics  that  focus  on  minimizing  time. 
Dependent  upon  'he  heuristic  h(n )  being  used  in  the 


evaluation  of  each  state,  the  minimum-time  trajectory 
may  further  optimize  criteria  such  as  minimum  distance, 
smoothness  of  the  trajectory,  safety,  etc.  For  example: 

1)  f(n)  =  | Dx(n) |  min,  distance 

2)  f(n)  =  y'f*,/}*2]  +[A:2Du2]  +  [k,(Da2)] 

.  min.  dist.  in  State  Space. 

We  can  guarantee  that  any  trajectory  found  in  the  Slack 
Set  will  be  a  minimum-time  trajectory  because  we  have 
restricted  the  task  time  to  Tx. 

Unsuccessful  search  of  the  Slack  Set  —  Modified  Slack  Set 

Obviously,  the  task  time  of  the  time  optimal  trajectory 
in  the  presence  of  obstacles  may  exceed  the  task  time  of 
the  time  optimal  trajectory  in  the  absence  of  obstacles.  If 
in  the  presence  of  obstacles  all  trajectories  of  the  Slack 
Set  are  blocked,  no  trajectory  will  be  found  during  the 
search  of  the  Slack  Set. 

In  the  case  where  a  time-optimal  trajectory  is  not  found 
by  searching  the  slack  set,  the  task  time  is  increased  and 
each  actuator  is  sequentially  defined  to  be  the  temporary 
CA.  The  task  time-may  be  arbitrarily  increased  or  it  may 
be  increased  through  constraints  imposed  on  the  CA,  e.g., 
through  a  restriction  of  the  maximum  allowable  accelera¬ 
tion,  or  velocity.  We  define  the  modified  slack  set  as  the 
Slack  Set  which  is  created  through  an  increase  of  the  task 
time.  The  task  time  of  any  trajectory  belonging  to  a 
modified  slack  set  is  larger  than  the  task  time  of  any 
trajectory  of  the  slack  set. 

Consecutive  modified  slack  sets  are  generated  and 
searched  until  a  valid  trajectory  is  found.  Solutions  found 
using  the  modified  slack  set  are  not  guaranteed  to  be 
minimum-time  trajectories.  The  maximum  error  is  bound 
though,  and  is  equal  to  the  increase  of  the  original  task 
time.  A  sufficient  condition  for  the  convergence  of  the 
slack  set  technique  is  that  there  exists  a  monotonic  solu¬ 
tion.  If  no  solution  is  found  after  the  first  time  increment 
and  the  sequential  assignment  of  all  S'  -  1  axis,  in  turn,  as 
critical  axis,  the  time  is  incremented  again  and  the  entire 
process  is  repeated. 

IV.  Example 

This  paper  presents  examples  that  highlight  the  perfor¬ 
mance  of  the  slack  set  technique.  The  algorithm  attempts 
to  derive  a  time  optimal  obstacle  avoidance  trajectory  for 
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_ OBSTACLE  AUOIDRNCE  PATHS 

Vxm»x  *  6  Axmix »  3  If 

Vym»x  •  6  Aym*x  •  I  j 


Fig.  6.  Slack  set  Example  1. 


r  OBSTACLE  AU0I0ANCE  PATHS  1 

Vxmix ■  6 

Axmjx  •  3 

Vymix  •  6 

Aymix  •  2 

Fig.  7.  Slack  set  Example  2. 


a  robot  of  known  dynamic  capability.  Here  it  is  applied  to 
an  autonomous  vehicle  with  motion  being  constrained  to 
the  AY  plane,  with  bounded  distance,  velocity  and  accel¬ 
eration.  However,  the  algorithm  can  be  generalized  to  a 
robot  with  an  arbitrary  number  of  degrees  of  *reedom.  All 
obstacles  are  rectangular  with  random  size  and  position  in 
the  workspace.  When  overlapping,  they  produce  obstacles 
of  random  size  and  shape. 

We  are  in  search  of  the  inputs  needed  to  take  the 
vehicle  from  rest  at  an  initial  state  to  rest  at  a  goal  state 
and  do  so  in  minimum  time  while  avoiding  all  obstacles. 
The  Slack  Set  method  was  implemented  on  a  MAC  II 
computer  using  Turbo  Pascal. 

A  heuristic  search  is  used  in  searching  the  slack  set. 
Nodes  expanded,  are  those  nodes  that  are  examined 
during  the  search.  Nodes  generated  are  those  nodes  which 
appear  as  successors  of  the  nodes  expanded.  The  size  of 
the  state  space  and  the  number  of  nodes  examined  in 
order  to  find  a  solution  are  indicative  to  the  performance 
of  the  algorithm.  We  define  the  efficiency  of  the  search  as 
the  ratio  of  the  depth  of  the  search  (which  in  our  case  is 
equal  to  the  task  time),  divided  by  the  number  of  nodes 
expanded: 


Efficiency  - 


Depth  of  Search 
Number  of  Nodes  Expanded 


100%. 

(16) 


In  the  first  four  examples,  depicted  in  Figs.  6  through  9, 
we  are  looking  for  a  time-optimal  obstacle  avoidance 
solution  to  the  task  of  starting  from  rest  at  location 
(-75,  -40)  and  ending  at  rest  at  (81,6).  Variations  of  the 
robots  maximum  velocity  and/or  acceleration  result  in 
drastic  changes  in  the  trajectory  derived  by  the  Slack  Set 
technique.  The  ability  of  the  technique  to  accommodate 
secondary  constraints  is  illustrated  in  Figs.  10  and  11.  A 


Fig.  8.  Slack  set  Example  3. 


trajectory  obtained  by  searching  the  modified  slack  set, 
after  unsuccessful  searches  of  the  Slack  Set,  is  illustrated 
in  Fig.  12.  Trajectories  derived  through  the  Slack  Set 
technique  are  compared  with  those  derived  using  a  path 
derived  by  the  Kgraph  technique  in  Figs.  13  and  14. 
Finally,  Figs.  15  through  16  illustrate  the  performance  of 
the  slack  set  technique. 


A.  Effects  of  Robot  Dynamics 

Slack  Set’s  depeno-ttee  on  robot  dynamics  is  illustrated 
in  Figs.  6-9.  In  the  these  examples  we  are  looking  for  a 
time-optimal  obstacle  avoidance  solution  to  the  task  of 
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TABLE  I 

Slack  Set  Techniques  Dependence  on  Robot  Dynamics 


Example 

v 

imax 

v 

'un.u 

^  »ma* 

^  i  max 

#Nodes  in  Task 

#Exp.  Nodes 

<>Eff 

1 

6 

6 

3 

i 

3  277  908 

194 

14.1 

2 

6 

6 

3 

2 

6  555  816 

225 

12.2 

3 

6 

6 

2 

y 

6  555  816 

314 

9.1 

4 

6 

4 

3 

3 

6  555  816 

164 

17.1 

1  OBSTACLE  AU0I0ANCE  PATHS 

r 

-75 

. M 

*  -40* 

Xmix  *  100 

Ym*x  ■  75 

UEL0CITV  PROFILES 

8 

msfmmSBR 

Fig.  9.  Slack  set  Example  4. 


Fig.  II.  Time  optimal  obstacle  avoidance  trajectory  with  secondary 
constraint  on  acceleration. 


OBSTACLE  AUOIDRNCE  PATHS 


Fig  10  Time  optimal  obstacle  avoidance  trajectory  with  secondary 
constraint  on  distance. 


starting  from  rest  at  location  (-75.  -40)  and  ending  at 
rest  at  (81,6).  Variations  of  the  robots  maximum  velocity 
and/or  acceleration  result  in  drastic  changes  in  the  tra¬ 
jectory  derived  by  the  Slack  Set  technique.  Results  of 
variations  in  maximum  acceleration  are  depicted  in  Ex¬ 
amples  1,  2  and  3.  while  example  4  show  results  of 
variation  in  maximum  velocity. 

The  obstacle  avoidance  paths  illustrate  the  paths  de¬ 
rived  by  the  slack  set  technique.  Velocity  Profiles  illus¬ 
trate  the  profiles  of  the  critical  and  slack  axes.  All  obsta¬ 
cles  are  rectangular  and  are  assumed  fixed.  In  every 
example  shown  the  critical  axis  in  the  X  axis  and  the 
Slack  Axis  is  the  Y  axis.  The  critical  axis  velocity  profile  is 
always  trapezoidal  as  desired  by  the  Slack  Set  technique. 
The  slack  axis  velocity  profile  is  drawn  with  the  thicker 
line.  No  secondary  constraints  are  imposed. 

Dependence  of  the  Slack  Set  technique  on  robot  dy¬ 
namics  is  illustrated  in  Table  I  where  four  variations  of 
the  robots  dynamics  resulted  in  four  different  trajectories. 
It  is  important  to  notice  the  difference  which  exists  be¬ 
tween  trajectories  which  are  derived  with  accommodation 
of  robot  dynamics  and  those  which  rely  purely  on  the 
geometry  of  a  particular  task.  One  can  see  that  although 
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(b)  (c) 

Fig.  12.  Trajectory  found  by  searching  modified  slack  set,  <a)  Derived 
trajectory,  (b)  Unsuccessful  search,  (c)  Successful  search. 


Examples  1  through  4  display  the  same  geometric  proper¬ 
ties,  each  trajectory  displays  different  velocity  and  dis¬ 
tance  profiles.  On  the  other  hand,  all  examples  would  be 
seen  as  the  same  problem  by  the  f'graph  technique  which 
relies  purely  on  task  geometry. 

System  dynamics  affects  the  size  of  the  state  space. 
Using  the  Slack  Set  technique  one  severely  reduces  the 
number,  of  nodes  generated  and  examined.  From  Table  I. 
we  can  see  that  the  Slack  Set  technique  examines  less 
than  0.006%  of  the  state  space.  The  efficiency  of  the 
search  is  expressed  by  the  ratio  of  the  number  of  nodes 
expanded  divided  by  the  depth  of  the  search.  In  these 
examples  since  we  set  the  sampling  rate  to  unity,  the 
depth  of  the  search  is  equal  to  the  task  time. 


Fig.  13.  Comparison  between  a  slack  set  derived  minimum  time  i  .  ,c- 
tory  and  a  Vgraph  derived  minimum  distance  trajectory. 


Fig.  14.  Comparison  between  a  slack  sei  dt lived  minimum  lime  trajec¬ 
tory  and  a  Vgraph  d:nved  mimmun  distan  t  trajectory 


B.  Multiobjectice  Optimal  Navigation  via  .r,lack  Set 

In  the  following  paragraphs  we  illustrate  the  capability 
of  the  slack  set  technique  to  derive  time  optimal  trajecto¬ 
ries  which  optimize  secondary  constraints.  This  is  possible 
because  the  Slack  Set  usually  contains  more  than  one 
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Pg,  15.  Slack  Set  performance  verses  workspace  density. 

time  optimal  trajectory.  As  a  result  the  slack  set  may  be 
searched  using  heuristic  functions  that  optimize  con¬ 
straints  other  than  time  constraints  e.g.,  maximum  safety 
by  maximizing  the  minimum  distance  from  the  closest 
obstacle,  smoothness  via  minimization  of  jerk,  accelera¬ 
tion,  velocity,  etc. 

Figs.  10  and  11  depict  minimum-time  trajectories  of  the 
same  task  with  secondary  constraints  of  minimum-dis¬ 
tance  and  minimum-acceleration  respectively.  One  can 
see  how  each  trajectory  differs  significaintly  from  the 
other.  The  trajectory  of  Fig.  10  has  a  secondary  constraint 
on  distance.  As  a  result  the  path  derived  in  Fig.  10  is 
shorter  than  the  path  derived  in  Fig.  1 1.  By  maintaining  a 
constant  velocity  as  long  as  possible  the  acceleration  is 
equal  to  zero  as  long  as  possible  (Fig.  ID- 

C.  Unsuccessful  Slack  Set  Searches — Modified  Slack  Set 
Trajectories 

Not  every  search  of  the  slack  sti  is  successful.  When 
such  a  situation  presents  itself,  the  slack  set  algorithm 
searches  the  modified  slack  set,  which  is  generated  by 
increasing  the  task  time  along  the  critical  axis  of  motion 
by  a  fixed  amount.  This  fixed  increase  of  the  task  time 
along  the  critical  axis  provides  us  uith  an  upper  bound  on 
*he  time  difference  between  the  derived  task  time  arj  the 
actual  minimum  task  time.  Sequential  switching  of  the 
critical  axis  amoag  the  axis  of  the  robot  guarantees  the 
derivation  of  a  feasible  monotouL  path,  if  .-,uch  a  path 
exists. 

A  task  is  depicted  in  Fig.  12,  Initially  Arm„  =  3.  Exami¬ 
nation  of  all  slates  in  the  Slack  Set  does  not  produce  a 
time  optsmai  obstacle  avoidance  trajectory.  The  maximum 
acceleration  along  the  Critical  Axis  is  decremented 
"•  2),  thus  producing  a  longer  task  time,  and  the 
Modified  Slack  Set  Is  examined.  Once  again  the  siack  set 
is  searched  without  producing  a  trajectory  (Fig.  12(c)). 
For  reasons  of  simplicity,  sequential  switching  of  the 
critical  axis  will  not  be  illustrated  in  this  example. 

Once  again  the  maxirum  acceleration  along  the  critical 
axis  is  ••educed  and  the  Modified  Siack  Set  is  searched.  A 
trajectory  is  found  (Fig.  12(c))  and  we  can  guarantee  that 
the  maximum  error  is  smaller  than  the  difference  between 


Fig.  16.  Processing  time  comparison  for  slack  set  and  Kgraph  trajecto¬ 
ries. 

the  initial  task  time  and  the  derived  task  time.  As  a  result . 
we  can  guarantee  that  the  trajectory  found  in  Fig.  12  has 
a  maximum  error  of  12%.  If  a  solution  has  not  been 
found  for  a  maximum  acceleration  of  the  critical  axis 
equal  to  the  minimum  acceleration  of  the  axis,  we  would 
restore  the  maximum  acceleration  to  its  initial  value  and 
decrement  the  maximum  velocity  of  the  critical  axis. 

D.  Performance  Comparison  of  the  Slack  Set  and  Vgraph 
Techniques 

Fig.  13  depicts  the  results  of  a  comparison  between  the 
trajectory  found  using  the  Slack  Set  algorithm  and  the 
Kgraph  algorithm.  Ir  order  to  derive  the  minimum  time 
trajectory  which  would  follow  the  minimum  distance  path 
we  did  the  following.  First,  we  determined  the^minimum 
distance  path  using  tht  Kgraph  technique.  Next,  we  de¬ 
fine  a  XYVxVy  state  space  with  position  on  or  close  to  the 
minimum  distance  path  and  valid  velocity.  The  minimum 
time  trajectories  corresponding  to  the  Kgraph  paths  were 
determined  using  A*. 

Fig.  13,  emphasizes  the  distinction  between  the  mini¬ 
mum-time  trajectory  and  the  minimum-distance  trajec¬ 
tory,  showing  the  minimum-time  trajectory  to  have  a 
smaller  task  time  than  the  minimum-distance  trajectory. 
The  task  times  are  shown  to  be  34  and  39  sTor  the  xlack 
set  and  the  Kgraph  techniques  correspondingly.  One  can 
also  see  how  the  paths  derived  using  the  slack  set  are 
smooth  curves  compared  to  the  paths  obtained  using 
Kgraph  that  are  straight  line  segments  with  sharp  turns, 
requiring  complex  velocity  profiles.  The  velocity  proliles 
required  by  Kgraph  can  be  simplified  by  smoothing  the 
derived  path,  but  this  would  require  additional  computa¬ 
tion.  The  velocity  profiles  show  that  slack  set  produces 
smoother  icsuits,  a!w<.,s  having  one  axis  (*he  ,V  axis  in 
these  examples)  following  a  bang-bang  profile.  It  is  inter¬ 
esting  to  note  here  that  it  seems  that  Kgraph  also  tries  to 
follow  a  bang-bang  profile  at  least  alone  one  axis. 

Fig.  14  depicts  another  example  were  the  trajectory 
derived  using  the  $  lack  set  technique  has  a  smaller  task 
time  than  the  trajectory  derived  using  the  Kgraph  tech¬ 
nique.  This  example  also  illustrates  slack  set's  ability  to 
utilize  system  dynamics  to  its  fullest.  For  example  (he 
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slack  set  trajectory'  reaches  and  maintains  the  maximum 
velocity  along  the  slack  axis  during  the  initial  stage  of  the 
trajectory.  This  allows  the  velocity  profile  along  the  slack 
axis  to  terminate  21  units  of  time  into  the  task.  On  the 
contrary,  the  Kgraph  trajectory  is  in  motion  along  both 
trajectory  until  the  completion  of  the  task.  The  X  axis  of 
the  graph  denotes  the  percentage  oi  the  critical  area  that 
is  covered  by  obstacles.  The  various  shades  of  each  col¬ 
umn  denotes  size  of  the  determined  task  time  with  re¬ 
spect  to  the  time  optimal  task  time  (T*)  in  the  absence  of 
obstacles.  As  the  area  occluded  by  obstacles  increases 
from  5%  to  35%  we  see  the  percentage  of  trajectoi ies 
derived  with  time  optimal  task  time  decreases.  One  very 
interesting  result  of  this  experiment  was  that  the  slack  set 
technique  never  failed  to  find  a  feasible  monotonic  trajec¬ 
tory. 

Fig.  16  was  obtained  by  comparing  the  processing  times 
needed  in  deriving  a  trajectory  for  a  given  task  using  the 
slack  set  technique  and  the  k'graph  technique.  Both  tech¬ 
niques  seem  to  require  more  computations  as  the  number 
of  obstacles  is  increased,  though  the  search  space  of  the 
slack  set  is  inversely  proportional  to  the  mzc  of  the 
occluded  area  and  the  size  of  the  l  graph  technique  is 
proportional  to  the  number  of  obstacles  in  the  workspace. 

Though  implementations  of  each  technique  may  not  be 
optimal,  it  seems  that  the  computational  complexity  of 
the  |/graph  technique  increases  faster  than  the  slack  set 
technique. 

VII.  Conclusion 

The  slack  set  technique  was  introduced  in  this  paper. 
This  technique  exploits  the  natural  redundancy  of  a  robot 
system,  the  fact  that  a  time  optimal  trajectory  tor  a  given 
task  is  not  in  general  unique,  optimal  control  theory 
(Pontryagin's  maximum  principle)  and  linearization  and 
decoupling  techniques  (FLDT),  to  reduce  the  size  of  the 
search  space. 

The  slack  set  technique  is  able  to  dense  a  time-optimal 
obstacle  avoidance  trajectory  for  a  i.'N>t  when  sueh  a 
monotonic  trajectory  exists  in  the  s|.i,a  >st  Otherwise  a 
feasible  trajectory  will  be  found  it  «uih  a  munutomc 
trajectory  exists. 

Time  optimal  trajectories  derived 
ondary  constraints  by  searching  the  n  .O  vt  using 
heuristic  functions  that  optimize  critt.  r .» >n  .a  her  th.m  time, 
e.g.  maximum  safety  by  maximizing  the  minimum  distance 
from  the  closest  obstacle,  smoothness  >  j  minimization  of 
jerk,  acceleration  and  velocity,  etc. 

Trajectories  derived  using  the  skulk  ut  Uchniquc  ac¬ 
commodate  robot  dynamics  and  as  a  jcmiI*  duplav  Muixnh 
velocity  and  distance  profiles.  On  the  « >t her  hand,  tech¬ 
niques  that  rely  purely  on  the  geometry  of  the  task 
generate  paths  which  in  most  cases  have  sharp  turns 
(unless  a  smoothing  algorithm  is  employed)  and  velocity 
profiles  which  are  hard  to  follow. 

For  the  slack  set  technique,  the  number  of  legal  states 
in  the  state  space  decreases  as  the  number  of  obstacles  in 


the  workspace  increases.  Thus,  the  size  of  the  search 
space  is  inversely  proportional  to  the  number  of  obstacles. 
In  Kgraph,  the  size  of  the  state  space  is  proportional  to 
the  number  of  obstacles  in  the  workspace.  Task  times 
derived  using  the  slack  set  technique  also  seem  to  have  a 
linear  relationship  with  respect  to  the  density  of  the 
occluded  area  of  the  workspace.  The  percentage  of  trajec¬ 
tories  derived  with  time  optimal  task  times  decreases  as 
the  area  occluded  by  obstacles  is  increased. 

Though  implementations  of  each  technique  may  not  be 
optimal,  it  seems  that  the  computational  complexity  of 
the  k'graph  technique  seems  to  increase  faster  than  the 
slack  set  technique. 

It  was  shown  in  the  examples  of  Section  IV,  that 
trajectories  comprised  of  the  minimum  distance  path  and 
velocity  profiles  required  for  the  minimum-time  traversal 
of  the  minimum  distance  path  are  not  guaranteed  to  be 
time  optimal  trajectories.  It  is  important  to  notice  the 
difference  which  exists  between  trajectories  which  are 
derived  with  accommodation  of  robot  dynamics  and  those 
which  rely  purely  on  the  geometry  of  a  particular  task. 
One  can  see  that  although  Examples  1  through  4  display 
the  same  geometric  properties,  each  trajectory  displays 
different  velocity  and  distance  profiles.  On  the  other  hand 
all  four  examples  would  be  seen  as  the  same  problem  by 
the  l^graph  technique  which  relies  purely  on  task  geome¬ 
try. 

Not  every  search  of  the  slack  set  is  successful.  When 
such  a  situation  presents  itself  the  slack  set  algorithm 
searches  the  modifie  i  slack  set,  which  is  generated  by 
increasing  the  task  time  along  the  critical  axis  of  motion 
by  a  fixed  amount.  This  fixed  increase  of  the  task  time 
provides  us  with  an  upppr  bound  on  the  time  difference 
between  the  derived  task  time  and  the  actual  minimum 
task  time. 

All  trajectories  derived  via  the  slack  set  technique  are 
monotonic.  Degradation  of  the  task  time  performance  of 
the  derived  trajectory  will  occur  when  the  critical  axis 
velocity  profile  is  forced  to  have  a  very  small  value. 

Obstauco  moving  along  a  known  trajectory  can  easily 
be  included  by  the  slack  .set  method.  This  may  he  accom¬ 
plished  using  swept  volumes  or  through  the  inclusion  of 
tunc  in  the  transformation  of  the  N-DOF  problem  to  the 
i  VI)  DOF  problem.  Accommodation  ot  moving  obstacles 
has  begun  and  will  be  presented  in  future  work. 

The  slack  set  technique  was  implemented  using  Turbo 
Pascal  on  a  Mac  II.  time  pertuimanoe  of  the  technique 
seldomly  exceeded  a  few  minutes,  which  includes  some 
quite  intense  graphics.  The  size  of  the  state  space  was  in 
the  range  of  107  states,  the  tree  structure  being  searched 
was  in  the  range  of  It)’’  nodes. 
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Abstract 

In  this  article,  we  explore  the  relationship  of  learning  and  adaptation  in  robot  control. 
Learning,  in  this  context,  is  the  process  of  identifying  the  robot  dynamics  and  its  interaction  with 
the  environment  for  the  purpose  of  improved  tracking  over  an  infinite  horizon.  Whereas, 
adaptation  is  the  process  of  adjusting  the  controller  to  comply  with  the  stabilization  (regulation  and 
tracking)  needs  of  the  closed  loop  system  for  the  present  finite  horizon  problem.  We  thus 
demonstrate  that  learning  conflicts  with  adaptation  in  its  tendency  to  increase  the  present  tracking 
error,  due  to  the  minimization  of  different  criteria  (Dual  control  principle). 

Exploratory  Schedules  (ES)  are  reference  trajectories  which  are  specifically  designed  to  provide 
efficient  closed  loop  learning.  We  show  how  the  design  of  ES  is  an  essential  aspect  of  learning 
which  has  been  neglected.  We  relate  ES  design  to  the  issue  of  input  richness  (or  persistent 
excitation).  Our  ES  represents  a  weaker  criteria  than  persistent  excitation 

The  robot  model  parameters  are  viewed  as  state  variables.  They  are  used  to  form  the 
augmented  state  space  in  which  asymptotic  stability  of  the  origin  implies  both  asymptotic  learning 
and  adaptation  A  theorem  regarding  constructive  sufficient  conditions  for  asymptotically  stable 
closed  loop  learning  is  proved,  and  examples  of  learning  in  1  and  2  degree  of  freedom  (DOF) 
manipulators  are  given. 
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Adaptive  control  of  robot  manipulators  has  been  the  subject  of  much  research  in  recent  years  (  , 
see  [Hsia]).  Adaptation,  in  control,  is  the  process  of  adjusting  the  controller  to  comply  with  the 
regulation  and  tracking  requirements  of  the  closed  loop  system.  Direct  adaptive  controllers,  e.g. 
[Li-Slotine  88],  use  tracking  errors  of  the  joint  motion  to  direct  the  robot  model  parameter 
adjustment  The  direct  adaptive  controllers  are  based  on  the  full  dynamic  model  of  the  robot 
Learning,  in  this  context  is  the  identification  of  the  true  values  of  the  manipulator  parameters  in 
closed  loop  operation. 

In  operation,  the  controller  is  given  a  trajectory  by  a  path  planner  in  order  to  accomplish  some 
useful  task.  The  controller  then  adapts  the  parameters,  on  line,  so  as  to  satisfy  the  tracking 
requirements.  If  the  parameters  are  not  known  exactly,  there  will  be  a  transient  period  of  tracking 
error  while  adaptation  occurs.  So  that  identification  of  the  true  parameters  is  desirable  for 
increased  tracking  precision. 

Present  robot  adaptive  controllers  (see  for  example  [SIotine-Li  87  ],  [Craig],  [Ortega-Spong]) 
use  robot  parameter  update  laws  which  rely  on  the  assumption  that  the  parameters  are  constants. 
Also,  to  guarantee  learning,  a  trajectory  which  is  "rich"  enough  to  expose  all  the  dynamics  of  the 
manipulator  is  required  for  convergence  of  the  parameter  estimates  to  their  true  values.  Richness 
of  input  refers  to  trajectories  such  that  the  internal  signals  of  the  parameter  adjustment  mechanism 
is  persistently  exciting  (see  [Narendra-Annaswamy]).  If  the  trajectory  is  not  persistently  exciting, 
stability  is  assured  but  not  learning.  Thus  future  tasks  are  performed  with  the  same  transient 
tracking  errors. 

We  propose  (see  figure  1 . 1 )  that  whenever  it  is  possible  to  modify  the  input  e.g.  prior  to  putting 
the  manipulator  to  work  doing  useful  tasks  or  periodically  when  parameters  change  and  the  path 
planner  is  not  demanding  a  new  path,  that  there  will  be  a  learning  period  where  the  controller  learns 
the  true  values  of  the  parameters  by  tracking  artificially  designed  Exploratory  Schedules  (ES).  ES 
are  trajectories  specifically  designed  for  asymptotic  learning  of  the  system  dynamics.  Any 
trajectory  that  is  persistently  exciting  could  be  used  as  an  exploratory  schedule.  However,  the 
synthesis  of  persistently  exciting  trajectories  is  generally  not  straightforward.  In  this  article,  we 
show  how  to  construct  Exploratory  Schedules  which  guarantee  closed  loop  learning.  Our  ES  are 
not  necessarily  persistently  exciting. 

The  rest  of  the  article  is  formatted  as  follows.  Section  2  presents  the  general  structure  and 
properties  of  a  rigid  robot.  Section  3  defines  the  adaptive  control  structure  used,  and  presents 
sufficient  conditions  for  global  asymptotic  learning  of  parameters  which  is  not  explicitly  based  on 
persistent  excitation.  Section  4  describes  simulation  results  of  learning  with  1  and  2  degree  of 
freedom  (DOF)  manipulators.  Section  5  concludes  the  article. 
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figure  1.1:  Block  diagram  of  the  proposed  system. 


2,1  Lagrange-Euler  Representation  of  Rigid  Robot  Dynamic 

A  rigid  robot  is  defined  as  an  open  kinematic  chain  of  rigid  links,  which  are  joined  by  prismatic 
or  revolute  joints.  v 

The  Lagrange-Euler  formulation  of  robot  dynamics  results  in  a  closed  form  solution  (the 
agrange-Euler  formulation  is  well  known  in  robotics  literature,  for  a  detailed  derivation  of  the 
closed  form  solution  see  references  (Fu  et  al.],  [Paul  81],  [Asada-Slotine]).  The  formulation 
requires  an  understanding  of  the  spatial  relationship  between  the  links  of  the  robot  This 
relationship  is  commonly  represented  by  4  x  4  homogeneous  transformation  matrices  (see 

reference  [Paul  81]),  which  relate  the  position  and  orientation  of  the  ith  link  coordinate  system  to 
the  (t-  l)th  coordinate  system. 

The  closed  form  solution  obtained  by  the  Lagrange-Euler  formulation  has  the  general  matrix 
torm  of 

D(q)q  +  C(q.q)q  +  G(q)  =  x  (2.1) 

where 


D(q) ,  ls  311  n  x  n  matrix  of  terms  related  to  inertial  forces  acting  on  the  robot, 

C(q,q)  is  an  n  x  n  matrix  of  terms  related  to  coriollis  and  centrifugal  forces 

G(q)  is  an  n  x  1  vector  related  to  gravitational  forces, 

q  is  an  n  x  1  vector  of  generalized  joint  coordinates, 
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x  is  an  n  x  1  vector  of  generalized  forces/torques, 
n  is  the  number  of  degrees  of  freedom. 

The  formulation  results  in  n  second-order,  coupled  differential  equations.  If  permanent- 
magnet-actuator  dynamics  are  included  in  the  formulation,  another  n  first  order  differential 
equations  are  needed  to  describe  the  robot.  Typically,  a  robot  has  6  DOF,  which  results  in  an  18th 
order  system  which  is  coupled,  time-varying,  and  highly  nonlinear.  Furthermore,  the  dynamics 
are  dependent  on  the  mass  distribution  of  the  robot,  the  exact  values  of  which  are  rarely  known  at 
the  time  of  controller  design,  and  may  change  during  use. 

2.2  Properties  of  Robot  Dynamics 

It  has  been  pointed  out  by  a  number  of  authors  (see  for  example  [Ortega-Spong]),  that  there  are 
properties  of  the  dynamics  (2.1)  that  can  be  exploited  for  robot  control.  Three  of  the  properties  are 
repeated  here,  as  they  will  be  used  in  other  sections. 

Property  1:  D(q)  is  symmetric  and  positive  definite  V  q  =>  D-1(q)  exists  V  q. 

Property  2:  D(q)  -  2C(q,q)  is  skew  symmetric  =>  qT(  D(q)  -  2C(q,q)  )q  =  0  V  q. 

Property  3:  x  =  D(q)q  +  C(q,q)q  +  G(q)  =  Y(q,q,q,q)0 

where  Y(q,q,q,q)  is  an  n  x  r  matrix  of  known  functions  and  0  is  an  r  x  1  vector  of  constant 
parameters.  Property  3  implies  that  the  robot  dynamics  may  be  viewed  as  a  linear  operator  from 
the  parameters  to  the  joint  torques. 

3.  Closed  Loop  Learning  Via  SelectiQn_oLExDlQratQrv  Schedules 
3.1.  Controller  Structure 

In  [Slotine-Li],  an  adaptive  controller  which  guarantees  global  asymptotic  tracking  was 
formulated.  The  controller  is  based  on  knowledge  of  the  robot  dynamics  structure  and  the  use  of 
sliding  surface  control.  A  major  advantage  of  this  controller  is  that  there  is  no  need  to  measure  the 
joint  acceleration.  The  convergence  of  the  parameter  estimates  to  their  true  values,  however  is  not 
guaranteed.  In  this  work,  the  same  control  structure  is  used.  However,  we  synthesize,  whenever 
possible,  the  reference  input  to  guarantee  both  tracking  and  learning. 

For  the  system  of  equation  (2.1).  define  the  virtual  reference  trajectory 

qr  =  qj  •  A  Jo  e  dt  (3.1) 

qr  =  irAe 

qr  =  qa-Ai 

where  qd  is  an  n  xl  vector  of  desired  joint  coordinates,  e  =  q  -  q^ ,  and  A  is  a  positive  definite 
matrix  with  constant  coefficients.  Define  the  sliding  surface 


4 


(3.2) 


s  =  er  =  q  -  qr  =  e  +  Ae 
s  =  er  =  q  -  qr  =  e  +  Ae  . 

Let  the  control  input  be 

~t  =  D(q)qr  +  6(q,q)qr  +  3(q)  -  Kds  =  Y(q,q,q,,qr)8  -  Kds 

where  (0  is  the  estimate  of  (.),  and  Kd  is  an  n  x  n  positive  definite  matrix. 

The  parameter  adaptation  law  is 


£}  =  $  =  -  Ka_1  YT(q,q,qr,qr)s 


where  the  estimation  enror  is  defined  as  (.)  =  (.)  -  (.),  and  Ka  is  an  r  x  r  positive  definite  matrix 
with  constant  elements. 


3.2.  Sufficient  Conditions  to  Guat 

3.2.1.  Augmented  State  Space 
Define  the  augmented  state  space 


ting  or  Rigid  Robot  Dynamic 


e 

[xl] 

Xt 

X  = 

e 

= 

0 

.x3l 

Then  the  augmented  system  is 


' 

Xt 

X1 

x2  = 

e 

X3J 

Where  $  is  as  defined  as  in  eq  (3.4).  To  find  e»  combine  equations  (2.3)  and  (3.3), 

•  D(q)q  +  C(q,q)q  +  G(q)  =  D(q)qr  +  C(q,q)qr  +  G(q)  -  Kds  .  (3.7) 

Rewrite  (3.7)  in  terms  of 

e  =  q  -  qd  q  =  e  +  qd , 
e  =  q-qd=*q  =  e  +  qd, 
e  =  q  -  qd  =>  q  =  e  +  qd . 
s  =  e  +  Ae , 

qr  =  ^  - A  e  - 

qr  =  qd-Ae, 

D(  e+qd )( e+qd  )+C(  e+qd  ,e+qd )( e+qd  )+G(  e+qd )  = 
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*  D(  e+qd )( q^-Ac  )+£(  e+qd  ,e  +  qd)(  q*- Ae:)+3(  c+qd)  -K^e-.-  Ae ) . 

Then 

D(  e4qd)e  =  D(  e-fqd  )%+€(  e-Kjd  ,e  +  qd  )qd+d(  e+qd )  -  D^+qj  )Ae  -  g(  e+qd  ,e  +  qd  )Ae 

-C(e+qd,e+qd)e-Kd(e  +  Ae).  (3,8) 

Note  that  Yfo.q.q^q^  =  DCq)^  +  ftq.q)^  +  0(q), 

=  D(  e+qd  )qd+€(  e+qd  ,e  +  qd  )%*€(  e+qd ), 

=  Y(e+qd,e  +  qd,qd,q<^  , 

Y(q,q,qr,qr$  =  D(q)qr  +  C(q,q)qr  +  0(q), 

=  D(  e+qd )( c^-Ae  )+£(  e+qd  ,e  +  qd)(  ^j-Ae  )+Q(  e+qd ), 

=  Y ( e+qd  ,e  +  qd  .qj-Ae  ,qd-Ae  )0  , 
and 

D(q)  >  0  V  q  =>  D(  e  +  qd)  >  0  V  e  +  qd. 

Then  (3.8)  can  be  written  as 

D(  e+qd)e  =  Y(  e+qd  ,e  +  qd  ,qd  ,qd)0  -  D(  e+qd)Ae  -  C(  e+qd  ,e  +  qd  )Ae 

-  C(  e  +qd  ,e+qd  )e  -  Kd(  e  +  Ae ),  (3.9) 

and  e  is 

e  =  D(  e+qd)*l[Y(  e+qd  ,e  +  qd  ,qd  ,<^0  -  D(  e+qd  )Ae  -  C(  e+qd  ,e  +  qd)Ae 

*  C(  e  *qd  ,e+qd  )e  -  Kd(  e  +  Ae )] .  (3.1  Q) 

Then  the  augmented  system  is 
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x2 


x  = 


DCx^q^ 


-l|  Y(x1+qdx2fqdqdq^x3  -D(x  l+qd>Ax2  -C(x  j+q^Xo+q^Axj 
I  -C(x1+qd,x2+q(^x2  -K^tAx^ 


*  Kjl  Y^Xi+qjX^^-Ax!,  qd-Ax2)  (x2+Axj  ) 


(3.11) 

Equation  (3.1 1)  describes  the  closed  loop  dynamics  of  the  augmented  system.  We  emphasize  that 
asymptotic  stability  of  its  origin  implies  both  asymptotic  tracking  and  asymptotic  learning. 

3.2.2  Global  asymptotic  stability  of  the  augmented  system. 


Thfflrctq 
If  r  <1  n,  and  if 


Rank 


YUdV<Md> 

lim  t->  oo 


then  the  origin  of  system  (3. 1 1 )  is  globally  asymptotically  stable 
PrpQf 

Write  system  dynamics  ( eq.  (2.3))  in  terms  ofiq  =  s  +  qr,  q  =  s  +  qr  and  substitute  for  t  using  eq 
(3.1)  to  (3.4) 


D(q)(  s  +qr)  +  C(q,q)(  s  +  qr )  +  G(q)  =  D(q)qr  +  C(q,q)qr  *G(q)  -  Kds, 

D(q)s  =  D(q)qr  -  Drq)tir  *  C(q.q)qr  -  C(q,q)qr+  8(q)  -  G(q)  -  C(q.q)s  -  Kds, 

D(q)s  =  D(q)qr  +  Cfq.qHir  -  G(q)  -  C(q.q)s  -  Kds  .  (3.12) 

Define  the  Lyapunov  candid.;'.' : auction 

V  =  1/2  sTD(q)s  +  1/2  0-K.0  .  (3.13) 

which  is  positive  definite.  The  der.vative  of  V  is 

V=  1/2  s1'D(q)s  +  1/2  sTD<q.s  *  1/2  s'f'DkjiS  -  2  dTK^  +  1/2  0TKa0  .  (3. 14) 
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The  terms  in  (3.14)  are  scalars  and  therefore  symmetric,  then  ^  can  be  written  as 

V*  sTD(q)s  +  1/2  s^6<q)s  +  .  (3.15) 

Substituting  for  Ds  from  (3. 12) 

\^=  sT(  D(q)qr  +  £(q,q)qr  +  3(q)  -  C(q,q)s  -  K<is  ]  +  1/2  sT6(q)s  +  GfTKjzf  . 

Combining  and  rearranging  terms  and  using  the  property  sT(D(q)  -  2C(q,q)]s  =  0, 

V  =  sT[  D(q)qr  +  2(q,q)  %  +  G(q)  ]  -  sTKd  s  +  1/2  ST[D(q)  -  2C(q,q)]  s  +  . 

V  *  -  sTKdS  +  sT[  D(q)qr  +  2(q,q)  4-  +  G(q)  ]  +  $TK  Ji  .  (3.16) 

Which  can  be  written  as 

V  =  -  sTKdS  +  0^YT(q>q,qr,qr)S  +  0TfCa$  .  (3. 17) 

Substituting  for  fy  from  (3.4) 

</  =  -  sTKds  +  0TYT(q.q,qr,qr)s  -  8TKa  Ka*1  YT(q,q,qr,qr)s, 

V  =  -  sTKds£0.  (3.18) 

V<0Vs*0=>s->0. 

When  s  =  0,  Xi  =-Axj  =»  X|  and  x_.  approach  the  origin  exponentially  on  the  sliding  surface  s. 
This  in  turn  from  eq.  (3.1 1)  implies  that  X3  ->  0  . 

Then 


(3.19) 


holds  implies  that  X3  ->  0 . 
QED 
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TT  Selection  of  ES' 


An  implication  of  the  theorem  is  that  if  r  >  n,  a  sufficient  condition  for  k  £n  parameters  to  be 
identified  is  that  they  are  are  not  in  the  null  space  of 

lim  t->  oo 

Which  implies  that  a  desired  trajectory  may  be  chosen  such  that  columns  corresponding  to  the  k  In 
parameters  are  linearly  independent,  which  will  guarantee  that  the  parameters  are  identified.  So 
that  different  time  sections  of  the  Exploratory  Schedule  (which  specifies  qd,qd,qd  )  may  be 
designed  to  leam  different  components  of  the  vector  x3  it  its  dimension  exceeds  n. 
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The  first  simulation  was  done  for  a  single  link  manipulator  as  shown  in  figure  4. 1 


figure  4.1:  Simulated  1  DOF  manipulator. 

The  dynamics  of  the  manipulator  are 

t  =  md2q  +  mgd sin(q),  (4  j,y 

where,  d  =  1,  g  =  10,  and  m  =  2  .  The  matrix  of  known  functions  is 

Y(  q,  q )  =  [  d2  q  +  g  d  sin(q)],  (4  2) 

with  m  as  the  single  parameter  to  be  estimated.  The  control  law  as  specified  by  eq  (3.3)  is 

x  =  [  d2  q,  +g  d  sin(q)]m  -  kd(  q  -  q,),  (4  3)i 

where  qr  =  qd  -  (q  -  qd),  kd  =  1,  and  the  parameter  adaptation  law  as  defined  by  (3.4)  was  useduo 
estimate  m,  where  the  estimate  is  denoted  as  m  . 

In  experiment  la, 


qd  =  0.0,  qd  =  0.0  =>  Rankf  Y(qd^  )  =0 

t  ->  oo J 


which  is  less  than  r  -  n  -  1.  Notice  in  figure  4.2,  that  even  though  the  tracking  error  does 
converge  to  zero,  the  estimate  ot  m  does  not  converge  to  its  true  value  of  2.0 . 

In  experiment  lb, 


qd  =  0.0,  qd  =  1.0  =>  Rankf  } ]  =  1 


which  equals  n.  In  the  second  experiment,  shown  in  figure  4.3,  the  estimate  of  m  does  converge 
io  its  true  value.  Thus  closed  loop  learning  is  obtained. 


ILL  Manipulator  with  2  DQE 

The  second  simulation  was  done  for  a  2  DOF  manipulator  as  shown  in  figure  4.4 


figure  4.4:  Simulated  2  DOF  manipulator. 

The  dynamics  of  the  2  DOF  manipulator  are 

Ti  =  Dnqi  +  D12  ^2  +  2  0^0)2 +Cq22  +  Gi  (4-4) 

t2  =  Di2  Pi  +  d22  ^2 ' c  9i2  +g2 

where 

Dn  s  mi  h2  +  m2  h2  +  m2  h2  +  *  m2  h  h  cos(<l2)» 

D12  =  m2 122  +  m2  12  cos(q2), 

D22  =  m2 122 , 

C  =  m2  ^  12  sin(q2) 

G!  =  g(mi  +  m2  )lj  sin(qi)  +  g  m2 12  sin(qj  +  q2) 

G2  =  g  m2 12  sin(q!  +  q2) , 

with  irq  =-m2  =  10.0,  lt  =  12  =  1.0,  g  =  9.81  . 


The  parameters  are  defined  as 

0!  =  (mi+m2)  lt2  =  20.0 
02  =  m2l22=  10.0 
03  =  m2lj  12  =  10.0  . 


04  =  g(mj  +  m2  )1[  =  196.2 
05  =  g  m2 12  =  98. 1 

(4-5) 


Which  leads  to  the  reparameterization 
Y(q,q,q,q)  = 


qt  q i+ij i  cos(q2)(2q1+q2)-sin(q2)(2q1q2+q2)  sin(qi) 
0  cos(q2)q  j+sinfq^qj  0 


sin(q  i+q^ 
sin(q  j+qJ) 


(4-6) 
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The  control  law  as  specified  by  eq  (3.3)  is 


X  -  Y(  q,  q,  q^W  -  Kd  s,  (4.7) 

where  s  =  diag(  effe],  tq+tj),  Kj  =  diag(1000, 500),  and  the  parameter  adaptation  law  as  defined 
by  (3.4)  was  used  to  update  the  parameter  estimates  0,  with  Ka*i  =  diag(  1000,  1000)  .  The 
dimension  of  Y(  q,  q,  q,,^)  in  the  2  DOF  case  is  2  x  5  so  that  r  >n,  and  at  most  2  parameters  can 
be  guaranteed  to  be  learned  simultaneously.  The  initial  conditions  for  both  experiments  (2a  and 
2b)  are  at 

qi(0)  =  q2(0)  =  qt(0)=q2(0)=0, 
and  the  desired  velocities  for  both  experiments  is 

qdl(0)  =  qd2(0)  =  0. 


In  experiment  2a, 


qdi 


=  0.5,  qd2  =  -0.5  =>  Rank 


YMd>  ] 

Lim  t  ->  oo  | 


1 


which  is  less  than  min(r,  n)  =  2.  Notice  in  figure  4.4,  that  even  though  the  tracking  error  (which  is 
not  shown)  does  converge  to  zero,  the  estimates  of  04  and  05  do  not  converge  to  the  true  values . 


In  experiment  2b, 


'  '  y 

which  equals  n.  The  linearly  independent;columns  for  experiment  2b  correspond  to  parameters 
04  and  05  .  As  shown  in  figure  4.5,  the  estimates  of  04  and  05  do  converge  to  the  true  values . 
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4.z.  Exploratory  scneauie  ior  me  l  uut  manipulator 

Selection  of  the  exploratory  schedule  for  the  2  DOF  manipulator  was  accomplished  by  noting 
that  columns  4  and  5  of  (4.7)  are  functions  only  of  position,  therefore  selecting 

<idl=^d2=!4il=^d2S52;9i  ^  Qdl*  <ld2 suc^  ^at  columns  4  and  5  are  nonzero  as  ej,  e2,  ei ,  •>  0 

will  ensure  that  parameters  04  and  05  are  identified.  Next,  selecting  q<n=qct2=0.0,  and  qdl,  qd2, 
qdl,  q<i2  such  that  column  3  is  nonzero  in  the  limit  ensures  identification  of  parameter  03.  Then 
selecting  qdi»  qd2 such  that  columns  1  and  2  are  nonzero  ensures  the  identification  ofparameters 
04  and  05  .  The  actual  exploratory  schedule  is  as  shown  in  figures  4.6  to  4.8  .  Theoretically,  a 
parameter  is  guaranteed  to  be  identified  when  e^e^pe^  0.0.  However,  in  practice,  due  to 
noise  and  disturbances,  exact  tracking  may  not  occur  and  small  tracking  errors  may  lead  to  small 
errors  in  the  parameter  identification.  In  this  simulation,  a  parameter  is  said  to  be  identified  when 
Maximum(leil,  lc2l  Jejl,  le2l)  <  etoj,  where  etoi  =  0.01,  and  a  further  stipulation  that  attempted 
identification  of  each  parameter  is  to  last  at  least  3  seconds.  The  time  period  corresponding  to 
identification  of  04  and  05  is  0  <  t  <  4.2.  As  can  be  seen  in  figures  4.9  and  4.11,  the  parameter 
estimation  error  for  04  and  05  approaches  zero  as  all  joint  errors  fall  below  0.01  at  t-4.2  .  The 
time  period  corresponding  to  identification  of  03  is  4.2  £  t  <  7.3.  The  parameter  estimation  error 
for  03  approaches  zero  as  all  joint  errors  approach  zero  at  t=7.3  .  Parameters  04  and  05  are 
identified  during  the  time  period  7.3  <  t  <  10.3  . 
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figure  4.1 1.  Parameter  estimation  error  for  parameters  04  and  05  during  exploratory  schedule 


Time  (sec) 


figure  4.12:  Parameter  estimation  error  for  parameters  0h  02  and  03  during  exploratory  schedule. 
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4.3.  Comparison  of  controller  performance  before  and  after  learning 

A  set  of  experiments  was  r  ^formed  to  compare  the  performance  of  the  adaptive  controller  in 
tracking  a  test  trajectory  for  two  experiments:  3a)  without  prior  learning  of  the  parameters  through 
tracking  of  the  exploratory  schedule,  and  3b)  with  prior  tracking  of  the  exploratory  schedule.  The 
test  trajectory  is  shown  in  figures  4.13  to  4.15.  In  experiment  3a,  tracking  of  the  test  trajectory 
was  attempted  by  the  controller.  Two  attempts  were  made  by  the  controller  to  track  this  trajectory. 
In  experiment  3b,  the  controller  first  followed  the  exploratory  schedule  as  detailed  in  section  4.2 , 
then  the  same  test  trajectory  as  used  in  experiment  3a  was  attempted.  Figures  4.16  to  4.18  show 
the  sampled  Cartesian  path  of  the  end  effector  (i.e.  the  tip  of  link  2)  as  the  controller  attempts  the 
trajectory.  Although  there  is  improvement  in  performance  in  the  two  attempts  without  learning  via 
the  exploratory  schedule  (figures  4.16  and  4.17),  figure  4.18  shows  that  the  tracking  error  is 
greatly  reduced  after  the  controller  first  learns  the  parameters  by  tracking  the  exploratory  schedule. 
The  increase  in  performance  is  more  apparent  when  viewed  in  the  joint  space,  as  shown  by  the 
graphs  of  joint  position  and  velocity  errors  in  figures  4.19  to  4.22 . 
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figure  4.14:  Desired  Cartesian  velocity  of  the  end  effector  during  the  test  trajectory. 
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figure  4. 16:  Path  of  the  end  effector  during  first  attempt  by  the  controller,  prior  to  learning  via 

exploratory  schedule. 


figure  4.17:  Path  of  the  end  effecvr  .luring  second  attempt  by  the  controller,  prior  to  learning  via 

exploratory  schedule. 
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flgure  4.18:  Path  of  the  end  effector  during  first  attempt  by  the  controller,  after  learning  v 

exploratory  schedule. 
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figure  4.21:  Joint  velocity  errors  during  second  attempt  by  the  controller,  prior  to  learning  via 
exploratory  schedt.e. 


figure  4.22:  Joint  velocity  errors  during  first  attempt  by  the-controller,  after  learning  via 
exploratory  schedule. 
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In  this  article,  we  explored  the  relationship  of  learning  and  adaptation  in  robot  control.  We 
have  shown  how  the  design  of  ES  is  an  essential  aspect  of  learning.  We  have  related  ES  design  to  - 
the  issue  of  persisterit'excitation.  The  Exploratory  Schedule  represents  a  weaker  criteria  than 
persistent  excitation 

The  robot  model  parameters  were  viewed  as  state  variables.  They  were  used  to  form  the 
augmented  state  space  in  which  the  asymptotic  stability  of  the  origin  implies  both  asymptotic 
learning  and  asymptotic  tracking.  A  theorem  regarding  constructive  sufficient  conditions  for 
asymptotically  stable  closed  loop  learning  was  proved,  and  examples  of  learning  in  1  and  2  degree 
of  freedom  (DOF)  manipulators  were  given. 

The  simulation  results  of  the  ES  for  a  2  DOF  manipulator  (section  4.2),  showed  how  in 
practice,  due  to  noise  and  disturbances,  exact  tracking  may  not  occur  and  small  tracking  errors  may 
lead  to  small  errors  in  the  parameter  identification.  The  comparison  of  controller  performance 
before  and  after  learning  (section  4.3)  demonstrated  how  the  tracking  error  can  be  greatly  reduced 
after  the  controller  learns  the  parameters  via  an  Exploratory  Schedule. 

We  have  demonstrated  a  method  for  the  selection  of  exploratory  schedules  for  rigid  robot 
manipulators  employing  inverse  dynamics  controllers  with  direct  parameter  adaptation.  A  natural 
extension  to  this  work  is  to  generalize  the  concept  to  a  broader  class  of  systems.  Immediate 
extensions  may  be  to  other  adaptive  control  laws  e.g.  that  of  [Sadegh-Horowitz]. 

We  have  specified  the  exploratory  schedule  as  a  desired  trajectory  that  is  to  be  followed  to  do 
learning  while  the  manipulator  is  not  doing  other  useful  tasks.  However,  a  more  appealing  idea 
would  be  to  integrate  the  exploratory  schedule  into  any  trajectory  that  is  specified  by  the  path 
planner.  The  exploratory  schedule  would  take  the  form  of  an  exploratory  (or  probing)  signal  into 
the  desired  trajectory  to  cause  asymptotic  learning.  The  injection  of  exploratory  schedules  in  this 
manner  would  be  similar  to  the  concept  of  dual  control  (see  Ref.  [Stengel])  in  that  the  probing 
signal  ES  is  combined  with  the  so  called  cautious  signal. 
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Improving  the  Solution  of  the 
Inverse  Kinematic  Problem  in 
Robotics  Using  Neural  Networks 

Allan  Guez  rijtd  Ziauddin  Ahmad 


A  method  for  solving  the  inverse  kinematic  problem 
(IKP)  for  robotic  manipulators  is  presented  in  this 
article.  The  IKP  finds  the  joint  angle  valnes  of  the 
robot  that  will  place  the  robot's  end-effector  in 
a  desired  position  and  orientation.  There  is  more 
than  one  solution  to  this  problem.  The  problem¬ 
solving  method  presented  is  a  hybrid  approach  in 
which  a  neural  network  is  trained  to  provide  an 
approximate  solution.  The  approximate  solution  is 
then  used  as  an  initial  guess  to  an  iterative  proce¬ 
dure  that  provides  a  final  solution  within  some 
specified  tolerance  in  the  robot’s  work  space.  The 
proposed  hybrid  method  achieves  about  a  twofold 
increase  in  computational  efficiency  for  an  indus¬ 
trial  manipulator  with  more  consistency  in  the 
time  required  to  obtain  the  solution  to  the  robotic 
manipulator.  This  article  was  accepted  for  publi¬ 
cation  August  1089. 


A  robot  consists  of  a  serial  chain 
of  rigid  links  connected  to  each 
other  by  actuated  joints.  In  robot 
kinematics,  the  motion  of  the 
robot  links  is  studied  in  terms 
of  the  robot’s  mechanical  geometry  and  struc¬ 
ture.  The  forward  kinematic  problem  is  that 
of  finding  the  end-effector,  or  gripper,  position- 
and-orientation  m-dimensional  vector  in  a 
Cartesian  space,  X,  as  a  function  of  the 
n-dimensional  joint  vector,  q.  This  can  be 
expressed  as 

X  =  f(q)  (1) 

where  f  is  a  nonlinear,  continuous,  and  differ¬ 
ential  function  determined  by  the  robot’s 
geometry. 

The  inverse  kinematic  problem  is  defined  as. 
given  the  vector  X,  find  the  vector  q.  That  is 
to  say,  solve  equation  1  for  q, 

q  =  f(X).  (2) 


The  IKP  usually  does  not  have  a  unique  solu¬ 
tion.  (The  IKP  is  also  referred  to  as  the  arm 
solution.)  The  significance  of  the  inverse  kine¬ 
matic  problem  is  depicted  in  Exhibit  1.  This 
exhibit  shows  the  command  level  that  gener¬ 
ates  a  task  K,  which  is  passed  to  the  path 
planner.  The  path  planner  genorates  a  feasible 
path  in  the  work  space  (i.e.,  the  Cartesian 
space)  to  accomplish  the  task.  In  order  to  fol¬ 
low  this  path,  the  joints  m"st  be  actuated,  so 
the  joint  values  that  will  make  the  end-effector 
follow  the  Cartesian  path  are  needed.  The 
inverse  kinematics  block  in  the  diagram  makes 
the  desired  transformation  and  provides  the 
controller  with  the  joint  values  for  q  to  be 
followed.  The  controller  then  generates  the 
control  signal,  C,  that  drives  the  joints  to  follow 
the  desired  path.  Additional  information  con¬ 
cerning  the  current  state  of  the  robot  comes 
from  the  sensors. 

A  robot’s  task  is  usually  specified  in  a  Cartesian 
space,  so  the  inverse  kinematics  solutions  are 
the  ones  most  frequently  used.  The  solutions 
can  be  split  into  two  categories:  closed  form 
and  iterative.  Closed-form  solutions  include 
matrix  algebraic  or  geometric  methods.1  In 
general,  closed-form  solutions  are  impossible 
because  f(q)  in  equation  1  is  highly  nonlinear. 
However,  because  of  the  importance  of  a  fast 
solution  to  the  IKP,  robots  are  frequently 
designed  to  have  simplified  inverse  kinematics 
(e.g.,  the  last  three  joint  axes  intersect  in  a 
point).  Thus,  many  existing  industrial  manip¬ 
ulators  possess  closed-form  solutions  to  the 
IKP.  Of  course,  wheneu?r  closed-form  solu- 
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Exhibit  1.  Block  Diagram  of  Inverse  Kinematic  Control 
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tions  are  available,  they  can  be  rapidly  and 
precisely  implemented;  however,  such  solu¬ 
tions  are  customized  to  the  robot  and  require 
accurate  knowledge  of  all  geometric  parame¬ 
ters,  such  as  link  lengths  and  joint  axes 
geometry. 

Iterative  solutions  are  used  for  manipulators 
that  do  not  have  a  closed-form  solution,  which 
include  nonredundant  and  redundant  manip¬ 
ulators.  These  methods  start  with  an  initial 
guess  for  the  solution  and  use  Newton- 
Raphson  or  gradient-descent  algorithms  to 
find  the  correction  vector  to  be  added  to  make 
the  error  small.2  These  methods  are  simple  to 
implement  and  are  inrippendent  of  the  type 
of  manipulator.  However,  they  require  knowl¬ 
edge  of  forward  kinematics,  they  are  compu¬ 
tationally  intensive,  they  may  have  slow  con¬ 
vergence,  and  their  performance  depends  on 
the  quality  of  the  initial  guess. 
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In  many  cases,  there  are  an  infinite  number  of 
configurations  for  q  that  place  the  end-effector 
in  the  prespecified  position  and  orientation 
(i.e.,  the  IKP  has  an  infinite  number  of  solu¬ 
tions).  In  these  cases,  additional  constraints 
on  the  allowed  configurations  or  performance 
functions  are  introduced  to  reduce  the  number 
of  legitimate  joint  configurations  or  to  single 
out  a  unique,  preferable  configuration.3  In  that 
way,  the  IKP  is  modified  to  have  a  finite  num¬ 
ber  of  solutions.  Motion  heuristics  are  then 
used  during  path  planning  to  select  a  unique 
configuration. 

Conventional  methods  for  solving  the  IKP 
have  the  disadvantage  Oi  being  cumpiex  anu 
restricted  to  only  a  class  of  nonredundant 
robots  or  being  slow  in  converging  to  a  solu¬ 
tion.  These  problems  are  addressed  in  this 
article  and  a  combination  of  a  neural  network 
with  a  conventional  method  is  suggested  as 
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a  workable  alternative  to  conventional  meth¬ 
ods  alone. 

The  hybrid  method 

This  approach  to  the  solution  of  IKP  of  robots 
is  an  iterative  procedure  for  which  the  initial 
guess  is  given  by  trained  neural  networks. 
Exhibit  2  shows  the  schematics  of  the  proposed 
method.  Exhibit  2a  shows  an  array  of  neural 
networks.  Each  of  the  networks,  when  pro¬ 
vided  with  the  desired  position  and  orienta¬ 
tion,  Xd,  gives  the  type  of  solution  (in  terms  of 
configuration)  that  nearly  matches  its  training 
data.  The  selector  block,  shown  in  the  exhibit, 
chooses  one  of  the  solutions  and  passes  it  to 
the  iterative  procedure  as  the  initial  guess,  q0. 
The  iterative  procedure  picks  up  the  initial 
guess  and  iterates  until  an  acceptable  solution 
is  obtained. 


Implementing  the  proposed  method  starts  with 
the  robotic  manipulator  for  which  the  IKP  is 
to  be  solved  and  an  untrained  neural  network 
of  suitable  size.  Data  pairs  of  the  end-effector 
position  and  orientation  and  the  corresponding 
joint  values  are  generated.  This  data  is  used 
for  training  the  neural  network,  with  the  posi¬ 
tion  and  orientation  vectors  as  the  input  and 
the  corresponding  joint  values  as  the  desired 
output.  After  the  training  is  completed,  the 
trained  neural  network  is  coupled  with  the 
iterative  method,  as  shown  in  Exhibit  2b,  for 
system  operation.  During  system  operation, 
the  desired  position  and  orientation  of  the 
end-effector  are  provided  to  the  neural  net¬ 
work.  The  neural  network  gives  the  approxi¬ 
mate  solution  based  on  the  learned  connection 
weights.  This  approximate  solution  is  taken 
as  the  initial  guess  by  the  iterative  method  to 
yield  the  final  solution. 


Exhibit  2.  Diagram  of  the  Proposed  Method 


Inverse  Kinematics  Block 


a.  The  selector  block  choosee  an  Initial  guess  from  solutions  provided  by  the 
multilayered  feed-forward  neural  network  array 


b.  Trained  network  Is  coupled  with  the  Iterative  procedure 
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The  multilayered  feed-forward  network  used 
consists  of  the  supervised  back-propagation 
algorithm  used  to  train  the  output  nodes  that 
correspond  to  the  joint  values,  given  by 

q  =  Jq,  q,  . . .  q„]T  (3) 

related  to  the  desired  position  and  orientation 
of  the  end-effector,  Xd,  which  is  provided  to 
the  input-layer  nodes.  The  position  specifica¬ 
tion  consists  of  the  displacement  of  the  origin 
of  the  coordinate  system  (x,  y,  z)  attached  to 
the  end-effector  with  respect  to  the  base  frame. 
The  orientation  specification  consists  of  the 
Eulerian  angles  (i.e.,  <f>,9,tl/).* 

Xd  =  \4>  6  4)xyz\T  (4) 

where  4>  is  the  rotation  around  the  OZ  axis, 

0  is  the  rotation  around  the  OU  axis,  and  < l>  is 
the  rotation  around  the  OW  axis  in  the  given 
sequence  and  in  the  end-effector’s  frame  of 
reference.  The  data  pair  can  be  generated  using 
only  the  forward  kinematics  of  the  robot,  as 
discussed  in  the  following  paragraph. 

A  problem  with  this  arbitrary  set  of  data  points 
is  that  there  is  more  than  one  solution  to  the 
inverse  kinematic  problem.  By  nature,  feed¬ 
forward  neural  networks  can  learn  only  a 
single  transformation,  which  requires  the  se¬ 
lection  of  one  type  of  solution  as  the  training 
data  set.  Each  type  of  solution  is  bounded  by 
singular  regions  (defined  in  the  inset),  which 
correspond  to  the  rank  deficiency  of  the  Jaco¬ 
bian  (also  defined  in  the  inset)  of  the  manipu¬ 
lator.5  Thus  the  data  set  corresponding  to  one 
type  of  solution  can  be  generated  by  starting 
with  any  configuration  and  joint  values,  storing 
the  relevant  data  while  incrementing  in  joint 
space,  and  at  each  step  ensuring  that  the  Jaco¬ 
bian  is  not  near  rank  deficiency.  The  rank 
deficiency  of  the  Jacobian  can  be  easily  verified 
by  taking  the  determinant  of  the  Jacobian  (/), 
or  that  of  (]f)  in  case  of  an  under-  or  over- 
determined  manipulator  and  comparing  the 
result  with  zero.  However,  it  is  desirable  to 
obtain  data  that  is  sufficiently  rich  in  the 
configuration  space.  Two  methods  that  may 
be  used  to  achieve  this  are: 

•  Following  the  isocline  of  a  function  of  the 
joint  variahles 

•  Taking  all  the  combinations  by  varying  one 
joint  at  a  time.6 

Another  problem  encountered  is  in  the  range 
of  the  joint  angles.  For  a  particular  type  of 
solution,  the  entire  joint  range  for  each  joint 


may  not  be  covered;  thus  only  disjoint  portions 
of  the  maximum  joint  range  can  be  used.  This 
problem  of  disjoint  regions  can  be  handled 
in  two  ways: 

«  Modifying  the  network  architecture. 

•  Making  the  regions  continuous.7 

Case  studies 

The  back-propagation  (BP)  algorithm  simulat¬ 
ing  a  three-layer  perceptron  was  employed 
to  tackle  the  problems  described  in  the  fol¬ 
lowing  case  studies.  Continuous  input  and 
output  were  assumed.  The  nodes  assumed 
symmetric  sigmoidal  nonlinearity.8  The  learn¬ 
ing  rate  and  the  momentum  term  assumed 
the  values  of  0.1  and  0.4,  respectively.  Also, 
the  desired  outputs  were  normalized  between 
-  0.9  and  +  0.9.  Training  was  terminated  when 
the  error  rates  were  not  improving. 

Example  1:  A  three-degrees-of-freedom 
planar  manipulator 

A  three-degrees-of-freedom  (DOF)  planar 
manipulator  is  shown  in  Exhibit  3.  Before 
describing  the  simulations  and  results, 
redundance  as  specific  to  a  3-DOF  planar 
manipulator  must  be  defined.  In  a  plane,  two 
DOFs  are  sufficient  for  the  purpose  of  posi¬ 
tioning,  therefore  a  3-DOF  planar  manipulator 
has  one  redundant  DOF.  (A  detailed  definition 
is  given  in  the  Definitions  section.)  This  extra 
DOF  introduces  a  problem  of  an  infinite  num¬ 
ber  of  joint  configurations  that  result  in  the 
same  end-effector  position.  The  problem 
can  be  resolved  by  adding  an  additional  in¬ 
dependent  constraint  to  be  fulfilled  in  addition 
to  finding  the  best  position  for  the  end- 
effector. 


Exhibit  3.  A  Three-Degrees-of-Freedom 
Planar  Manipulator 
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Definitions 


Denavit-Hax&nberg  transformation  The 
Denavit-Hartinberg  matrix,  shown  in  Exhibit 
A,  is  a  representation  of  the  general  displace¬ 
ment  of  one  frame  with  respect  to  the  other. 

It  consists  of  the  rotation  matrix  and  the 
translation  vector.  In  addition  to  these,  the 
matrix  also  contains  the  information  concern¬ 
ing  perspective  transformation  and  a  scaling 
factor. 

Jacobian  The  Jacobian  (of  a  manipulator) 
specifies  a  mapping  from  velocities  in  joint 
space  to  velocities  in  Cartesian  space.*1 
Therefore,  the  Jacobian  J(x,q)  of  a  manipulator 
is  a  matrix  as  defined  below: 

J(x.q)  =  [dxjdqf ]  ;  i  =  /  =  l...n  (A) 

where  m  equals  the  number  of  independent 
Cartesian  variables,  and  n  equals  the  number 
of  joints.  The  columns  of  the  Jacobian  matrix 
show  the  change  in  the  position  and  orienta¬ 
tion  of  the  end-effector  due  to  the  change  of 
the  individual  joints. 

Singularities  A  singularity  corresponds  to  a 
configuration  of  the  manipulator  for  which 
the  end-effector  motion  is  constrained  in 
some  directions.  Also,  at  a  singularity,  the 
Jacobian  is  rank  deficient. 

IJI  =  0  (B) 

There  are  two  types  of  singularities.  One 
corresponds  to  the  boundary  of  the  work  space 
where  the  singularity  means  that  the  manipu¬ 
lator  cannot  cross  this  boundary.  The  other 
singularity  corresponds  to  the  situation  where 
the  axes  of  two  or  more  similar  joints  are 
aligned,  and  a  motion  in  one  of  these  joints 


Exhibit  A.  A  Denavit-Hartenberg  Matrix 
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can  be  canceled  by  a  counter-motion  in  any 
one  of  the  other  aligned  joints. 

Degrees  of  Freedom  (DOF)  The  number  of 
degrees  of  freedom  that  a  manipulator  pos¬ 
sesses  is  the  number  of  independent  position 
variables  that  would  have  to  be  specified  in 
order  to  locate  all  parts  of  the  mechanism.'1 
For  a  robot,  which  is  an  open  kinematic  chain, 
the  degrees  of  freedom  equal  the  number  of 
joints:  each  joint  corresponds  to  one  variable. 

Redundant  BOF  The  number  of  DOF  in 
excess  of  the  minimum  number  of  DOF  re¬ 
quired  to  arbitrarily  position  and  orient  an 
object  in  the  Cartesian  space  is  called  the 
redundant  DOF. 

An  Iteiativo  Procedure  The  schematics  of 
the  iterative  method  is  shown  in  Exhibit  B. 
This  procedure,  commonly  called  the  Newton- 
Raphson  method,  uses  the  linear 
approximation 

=  <7*  +  dqh  (C) 

where  qk  is  the  joint  variable  vector  at  the 


Exhibit  B.  Flow  Diagram  for  the  Iterative 

Procedure  to  Find  the  Joint  Values 


Find: 
G  ,  S(Q) 


Check  \  Yes 


8s  +  GSq=  0 


<7(k^  1)  = 
qW  +  8q(k) 
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Definitions  (Cont) 

kth  iteration  and  dqk  is  the  solution  to  the 
linear  system 

s(qk)  Gk  dqk  =  0,  (D) 

that  is, 

d9k  =  ~Gk' s(qk)  (E) 

Therefore  Equation  C  becomes 

<lk+t  =  q*  -  Gk‘s(qk),  (F) 

where  Gk  is  found  numerically  by  the  central 
difference  formula 

G„  =  (st  -  st)/ 2  dq,  ;  i,j  =  l..n  (G) 

Hera  m  residuals  result  because  of  the  forward 
kinematics  in  the  hand  Cartesian  space 
(m  =  6  for  a  general  manipulator)  (i.e., 

(f(q)  —  Xd),  where  f(q)  is  the  forward  kine¬ 
matics  and  Xd  is  the  desired  end-effector 
position  and  orientation  vector).  The  n-m 
residuals  correspond  to  the  constraints  to  be 
optimized  and  correspond  to 

Zh  =  0  (H) 

where 


where 


^  ( /n-m/m '  ■  f n-m 1  ( 

I 

/m  =  a  square  matrix  corresponding 
to  any  m  columns  of  the  Jaco¬ 
bian  matrix. 

,m  =  transpose  of  the  matrix  con¬ 
sisting  of  columns  not 
included  in  Jm  matrix. 

.m  =  an  identity  matrix  of  order 


h,  =  dH/dq,  ;  i  =  l...n  (J) 

where  H  defines  the  criteria  function  to  be 
optimized. 

Here  it  may  be  noted  that 

s(q)  =  l(f(9)  -  Xd)T:  (Zh)T|T 
and 

Glq)  =  ir(q)-ld(Zh)/d(qj)T\r. 

where  f(q)  is  the  Jacobian  of  the  manipulator 
and  d  is  the  differential  operator. 


ihe  performance  criterion  optimized  for  the 
3-DOF  planar  manipulator  was  the  manipula- 
bility  index,  defined  as 

H  =  vj/ri  (5) 

where  /  is  the  Jacobian  of  the  manipulator. 


The  forward  kinematics  of  this  manipulator  is 
given  by 

f,(0,,02)  =  X  =  L,  sin(0,)  -  Lz  sin  (0,  +  02)  (6) 

-  L3  sin  (0,  +  02  +  0j)  =  0 

f2(0,,02)  =  Y  =  L,  cos  (9,)  -  L2  cos  (0,  +  0 2)  (7) 

-  L3  cos  (0,  r  8,  -  6j)  =  0 

where  Lx,  Lz,  and  L3  are  the  link  lengths.  By 
the  iterative  procedure  (described  in  the  inset) 
the  m  (=  2)  residuals  correspond  to  the 
equations 

f,(0,.02)  -  Xd  =  0  (8) 

M0..02)  -  I'd  =  0  (9) 

and  the  n-m  (=  1)  residual  corresponds  to 
Zh  =  0  (10) 

—  2ll  V  (S22S23  —  C23S3) 

UVJ(2S2jS233  —  3S2233S3) 

+■  2u3V2S2Sj3  *  u2v2(c33  -  c22) 

+  2UV2(S2.92233  -  CjSlj)  -  u3vs22s3 
-  2u2v2s2s3 

where 

s„  =  sin(0,  +  0;) 
s„k  =  sin(9,  +  0,  +  0*) 
siiki  =  sin(0,  +  0,  +  0t  +  0,) 
c„  =  cos(0,  +  0;) 

The  network  for  this  case  consists  of  two  input 
nodes  and  three  output  nodes  and  two  hidden 
layers,  each  containing  10  nodes.  The  network 
was  trained  on  a  data  set  giving  the  X.Y  coor¬ 
dinates  for  the  end-effector  and  the  three 
joint  angles  that  optimized  the  manipulability 
criterion.  This  data  set  contained  121  input/ 
output  pairs  that  were  obtained  by  dividing 
the  work  space  into  10  angular  and  10  radial 
regions  and  taking  the  data  points  at  the 
corners  of  the  intersection  of  these  regions. 

The  program  that  generated  this  data  set  em¬ 
ployed  the  Newton-Raphson  method  for  solv¬ 
ing  the  equations,  and  the  Jacobian  of  the 
manipulator  was  found  by  the  central  differ¬ 
ence  formula.9  In  addition,  the  single  inverse 
was  ascertained  by  taking  the  initial  condition 
as  the  current  configuration  for  the  arm  solu¬ 
tion  to  a  subsequent  nearby  end-effector  posi¬ 
tion.  The  link  lengths  were  0.5m,  0.4m.  and 
0.1m  for  links  1.  2.  and  2  rpcnprfjyolv  cr>  itioi 
the  maximum  reach  was  1.0m. 

The  result  of  this  training  is  shown  in  Exhibit 
4.  Exhibit  4a  shows  that  the  error  begins  to 
increase  as  the  manipulator  moves  nearer  to 
the  singularity.  In  this  case,  singularity  occurs 
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Exhibit  4.  Results  of  a  Neural  Network  Trained  to  Give  a  Manipulability 
Index  Optimized  Solution  for  a  3-DOF  Planar  Manipulator 


0.0  X-axis  0.9 

a. 


02-desirod  (radians)  03-desired  (radians) 


c. 


d. 


Note: 

a  shows  the  error  in  end-effector  positioning  at  'he  center  ol  the  circle  ana  the  magmtuae  is  aepictea  by  the 
aiameter  of  the  circles  b  through  d  snow  ’he  -eiationsnip  Between  the  aesired  joint  angles  ana  those  given  by 
the  neural  networK, 


at  02  =  03  =  0  radians  anti  >  orrespunds  to  a 
circle  of  radius  of  lm  around  the  point  of 
origin  in  the  Cartesian  spat  e  Exhibits  4b 
through  4d  show  the  desired  punt  values  and 
those  given  by  ihe  neutal  m  tvvurk  The  neurai 
network  learned  the  solution  iatrlv  well 

When  the  neural  network  solution  was  coupled 
with  Newton-Raphson  iterative  procedure,  it 
resulted  in  nearly  a  five-fold  increase  in  com¬ 
putational  efficiency.  Exhibit  5  gives  a  detailed 


description  of  the  number  of  operations  re- 
quiied  for  the  iterative  method  and  the  solution 
given  by  the  neural  network.  It  was  found 
that,  on  the  average,  the  iterative  method  alone 
with  random  initial  guess  required  12  itera¬ 
tions  to  give  the  solution,  whereas  the  iterative 
method  that  used  the  solution  given  by  the 
ural  network  as  the  initial  guess  required, 
on  the  average,  only  two  iterations.  This  in¬ 
formation  is  taken  into  account  in  the  last  two 
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Exhibit  5.  Number  of  Operations  Required  for  the  3-DOF 
Manipulator  Solution 


\  Methods 

Operatlon^V 

Iteratlvs 

Method 

Neural  Net 
Solution 

Random 
Initial 
Guess 
(A  ^12B) 

MFN 
Solution 
as  Initial 
Guess 

(A-*-  2B  ~C) 

Constant 

(a) 

One 

Iteration 

(b) 

A 

B 

C 

D 

E 

Assignment 

Statements 

27 

419 

434 

5055 

1299 

Structured 

Statements 

— 

61 

193 

732 

315 

Multiplications 

15 

885 

219 

10275 

1944 

Additions 

16 

787 

242 

9460 

1832 

Function 

Calls 

13 

34 

30 

421 

111 

Total 

71 

2156 

1118 

H 

bEM 

Note: 

MFN  Multilayered  feed-lorward  network 


columns  of  Exhibit  5  so  that  the  corresponding 
numbers  in  the  last  row  give  an  indication 
of  the  difference  in  time  required  for  the  final 
solution  using  both  the  conventional  method 
and  the  neural  network  augmented  iterate e 
method. 

Example  2:  PUMA  560  robotic 
arm  positioning 

The  PUMA  560  consists  of  6  revolving  |oints 
(This  is  a  6-DOF  robot.)  The  first  three  joints, 
which  correspond  to  the  waist,  shoulder,  and 
elbow  motions,  contribute  mainh  to  position¬ 
ing  the  end-effector.  The  last  three  joints  i  or- 
respond  to  the  wrist  motion  and  contribute 
mainly  to  orienting  the  end-effector  The 
PUMA  560  parameters  were  taken  from  Ro¬ 
botics,  Control,  Sensing,  Vision,  and  Intelli¬ 
gence.10  These  parameters  are  given  in  Exhibit 
6.  This  manipulator  was  chosen  for  its  ease- 
of-use.  Generation  of  data  and  verification  of 


results  are  simplified  because  the  manipulator 
has  a  closed-form  solution.  PUMA  has  eight 
solutions  for  a  given  position  and  o.  mtation. 
which  are  signified  by  right  shoulder  or  'eft 
shoulder,  above  elbow  or  be'ow  e'bo.v,  and 
wrist  or  down  wrist. 

The  training  data  used  in  this  case  corresponds 
to  a  eft  arm,  below  elbow  and  up  wnst  configu¬ 
ration  In  the  simulations,  the  joint  limits  used 
fur  the  M\th  joint  were  -  180  degrees  and 

-  180  degrees,  instead  of  -260  degrees  and 

-  260  degrees. 

As  mentioned  in  the  previous  section,  there 
mav  be  more  than  one  disjoint  region  in  which 
th  1  solution  for  a  particular  configuration  lies, 
in  the  case  of  PUMA  560  wiih  the  consiueieu 
configuration,  the  values  for  joint  number  6 
lie  in  two  disjoint  regions.  One  corresponds  to 
the  region  between  —  tt/2  and  -it,  and  the 
other  corresponds  to  the  region  between  ir/2 
and  it.  These  need  to  be  trained  separately  or 
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Exhibit  6.  PUMA  Robot  Arm  Link  Coordinate  Parameters 


6, 

Joint  f 

Variable 

(degrees) 

(degrees) 

1 

90 

-90 

2 

0 

0 

3 

90 

90 

4 

0. 

0 

-90 

5 

0* 

0 

90 

6 

0 

0 

a, 

d, 

Lower 

Limit 

Upper 

Limit 

(mm) 

(mm) 

(degrees) 

(degrees) 

0 

0 

-160 

160 

431  8 

149  09 

-225 

45 

-20  32 

0 

-45 

225 

0 

433.07 

-110 

170 

0 

0 

-100 

100 

0 

56.25 

-266 

266 

else  combined  by  transforming  these  two  re¬ 
gions  by  it  and  -ir.  respectively.  In  this  case, 
they  were  combined  so  that  the  two  regions 
lie  in  the  first  and  the  fourth  quadrant  instead 
of  the  third  and  second  quadrants  and  then 
trained  with  a  single  network. 

The  network  in  this  case  consisted  of  six  input 
nodes,  one  output  node  each  for  the  six  joints 
and  two  hidden  layers  for  each  joint  consisting 
of  32  nodes  in  the  first  layer  and  eight  nodes 
in  the  second  layer.  The  standard  back-propa¬ 
gation  algorithm  was  used  with  the  learning 
rate  of  0.1  and  the  momentum  term  of  0.2. 

The  training  was  done  for  138  presentations 
of  the  800-sample  data  set. 

The  average  error  and  the  standard  deviation 
of  the  error  in  the  solution  given  by  the  neural 
network  for  each  joint  taken  over  100  samples 
is  charted  in  Exhibit  7.  This  exhibit  shows 
that  the  solution  given  by  neural  networks  is 
more  scattered  for  joints  4  to  6  as  compared 
with  joints  1  to  3. 

To  compare  the  results  of  a  random  initial 
guess  to  the  initial  guess  provided  by  the 
neural  network,  the  initial  guess  was  presented 
as  the  actual  solution  uniformly  perturbed 


Exhibit  7.  Statistics  of  the  Neural 
Network’s  Solution 


Joint 

1 —  Error  In  Degrees  — i 
i  Standard  1 

No 

Auorona 
•  ■  - - a- 

no%iiatjrtn 

i 

4  06 

13  428 

2 

-0  16 

30  492 

3 

5  64 

11  52 

4 

3.66 

61  236 

5 

-3  70 

24  912 

6 

-6  02 

36  828 

randomly  between  a  fraction  of  joint  limits  to 
the  iterative  method  alone  for  100  samples 
that  corresponded  to  each  fraction.  The  average 
number  of  iterations  for  the  100  samples  pre¬ 
sented  for  each  fraction  of  perturbation  of  this 
informed  Newton-Raphson  method  (INRM)  is 
plotted  in  Exhibit  8.  For  these  same  samples, 
requiring  the  positioning  and  orientating  of 
the  end-effector,  the  guess  given  by  the  neural 
network  was  also  presented  to  the  i'erative 
method  as  the  initial  guess.  The  corresponding 
average  of  the  100  samples  for  the  solution  to 
the  initial  guess  provided  by  the  neural  net¬ 
work  is  also  included  in  this  exhibit.  The 
average  number  of  iterations  for  the  neural 
network  method  corresponds  to  less  than  20% 
perturbation  with  respect  to  the  joint  limits 
around  the  actual  solution  for  the  INRM.  In 


Exhibit  8.  Comparison  of  the  Neural  Network 
Hybrid  Method  with  the  Informed 
Newton-Raphson  Method 


Per  Unit  of  Joint  Limits 
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Exhibit  9.  Comparison  of  the  Neural  Hybrid 
Method  with  the  Fixed  Estimate 
Newton-Faphson  Method, 
for  100  Samples 

I - No  of  Iterations - 

j  Standard 

Method  |  Average  Deviation 

Proposed  9  96  12  95 

Newton-  21.09  18.90 

Raphson 


simulations,  equation  D  in  the  Definitions 
section  was  solved  by  the  Gaussian  elimination 
method  and  partial  pivoting.  The  maximum 
number  of  iterations  allowed  for  the  i'  ’rative 
method  was  100.  The  iterative  method  was 
successfully  terminated  when  the  net  n  of  the 
difference  between  the  desired  and  actual 
end-effector  position  and  orientation  was  less 
than  1.0E-4. 

Next,  the  neural  hybrid  method  was  evaluated 
by  giving  a  fixed  estimate  to  the  iterative  pro¬ 
cedure.  This  fixed  estimate  was  taken  as: 

0,  =  0.  02  =  0.  03  =  it/4,  0.,  =  0.  05  =  IT/  * 
and  06  =  0,  which  is  a  configuration  corre¬ 
sponding  to  the  one  on  which  the  neural  net¬ 
work  was  trained.  The  average  and  standard 
deviation  for  the  number  of  iterations  for  the 
proposed  method  and  the  fixed  estimator,  in  a 
run  of  100  data  points,  are  given  in  Exhibit  9. 

The  neural  hybrid  method  achieves  more  than 
a  twofold  efficiency  in  computing  with  more 


consistency.  Moreover,  the  time  taken  by  the 
neural  network  equals  two  time  units  of  the 
iterative  procedure,  which  is  less  than  10%  of 
the  time  required  to  get  the  solution  by  the 
fixed  estimator  method. 

Exhibit  10  shows  the  comparison  plots  for 
two  cases  indicating  the  sequence  of  Newton- 
Raphson  iterations  for  the  fixed  initial  guess, 
and  the  guess  given  by  the  neural  network. 
Here,  the  error  along  the  ordinate  axis  corre¬ 
sponds  to  the  norm  of  the  error,  defined  as: 

Erroi  =  V  <=;  -  ef.  -  e;  ej  +  e;  - 

where  e  is  the  difference  between  the  desired 
and  the  actual  values,  and  the  subscripts  indi¬ 
cate  the  variable  for  which  this  difference  is 
taken.  Exhibit  11  shows  the  initial  guesses  by 
the  fixed  estimator  and  the  neural  network 
corresponding  to  Exhibits  10a  and  10b  respec¬ 
tively.  Also,  these  tables  show  the  actual  de¬ 
sired  solution  and  the  number  of  iterations 
taken  for  each  ot  the  two  types  of  initial 
guesses.  The  allowed  end-effector  position 
and  orientation  error  in  this  case  was  1.0E-7. 

Conclusion 

Neural  networks  were  trained  to  learn  the 
inverse  kinematic  problem  of  a  manipulability 
optimized  solution  of  a  3-DOF  robot  and  a 
6-DOF  robot.  It  was  tound  that  .he  neural  net¬ 
works  were  able  to  learn  the  transformations 
fairly  well  and  results  using  untrained  data 
indicated  that  the  neural  networks  are  able  to 
generalize  these  transformations. 


Exhibit  10.  Sequence  of  Newton-Rapksoi  Iterations  for  the 

Neural  Network's  Guess  and  the  Fixed  Initial  Guess 


b  Case  2.  For  10  Iterations 
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Exhibit  11.  Initial  Guess  and  Solution 
Corresponding  to  Examples 
Displayed  in  Exhibit  10 


Fixed  Initial 

Initial  Guess 

Estimate 

by  NN 

(26  Iterations) 

(6  Iterations) 

Solution 

Joint- 1 

0  00 

105.50 

89  02 

Joint-2 

0  00 

24  74 

6  82 

Joint-3 

45  00 

63  46 

54  19 

Joint-4 

0  00 

-7  17 

37  72 

Joint-5 

45  00 

56  48 

45  21 

Joint-6 

0  00 

170  32 

173  46 

Case  1 


Fixed  Initial 

Initial  Guess 

Estimate 

by  NN 

(10  iterations) 

(8  Iterations) 

Solution 

Joint-1 

0  00 

-43.25 

-12  76 

Joint-2 

0  00 

-40  57 

-51  26 

Joint-3 

45  00 

69  54 

82.56 

Jomt-4 

0.00 

-  12  37 

-85.25 

Joint-5 

45  00 

-58  34 

-82  76 

Joint-6 

0  00 

160  80 

113  35 

Case  2 


With  the  increase  in  the  complexity  of  the 
transformation,  which  occurs  with  the  increase 
in  the  number  of  DOFs,  the  size  of  the  network 
needs  to  be  increased.  This  means  that  the 
number  of  hidden  nodes  must  be  increased. 
But  an  increase  in  the  number  of  nodes  by 
one  increases  the  number  of  weights  bv  the 
sum  of  the  nodes  in  the  two  adjacent  layers. 
Therefore,  the  training  time  per  iteration  cycle 
of  training  increases,  and  so  does  the  overall 
training  time  required  to  train  the  network. 
Therefore,  for  larger  manipulators  the  training 
time  may  increase  exponentially 

The  proposed  hybrid  method,  which  takes  the 
solution  given  by  the  trained  neural  network 
as  an  initial  guess  to  an  iterative  procedure 
(Newton-Raphson  in  this  <  um>|.  <  ombines  the 
advantages  of  the  neural  network  and  iterative 
methods;  the  neural  network  is  independent 
of  the  type  of  manipulator  and  simple  to 
implement.  Only  forward  kinematics  is  re¬ 
quited  fot  this  method  aim  tins  i.uuiuiiiuuuii 
results  in  a  fivefold  increase  in  <  omputational 
efficiency  for  a  3-DOF  planar  manipulator 
and  a  twofold  increase  for  the  Pl'MA  560  (6- 
DOF)  robot.  The  decrease  m  >  omputational 
efficiency  foi  the  6-DOF  case  is  due  to  the 
trade-off  between  the  size  ot  the  network  and 


the  training  time,  and  the  accuracy  of  the 
solution  given  by  the  neural  network.  The 
training  time  is  decreased  with  a  decreasa  in 
the  size  of  the  network  at  the  expense  of  the 
accuracy  of  the  solution. 

Where  otherwise  the  number  of  iterations  may 
be  unacceptable  for  real-time  operation,  an 
initial  good  guess  given  by  the  neural  network 
cuts  the  number  of  iterations  to  only  a  very 
few.  which  allows  a  better  operation  of  the 
manipulator.  In  such  cases,  minimal  process¬ 
ing  is  required  within  each  control  cycle, 
and  real-time  control  performance 
improves.  A 
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Neurocontroller  Design  Via  Supervised  and 
Unsupervised  Learning 
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Abstract.  In  this  paper  we  study  the  role  of  supervised  and  unsupervised  neural  learning  schemes  in  the 
adaptive  control  of  nonlinear  dynamic  systems.  We  suggest  and  demonstrate  that  the  teacher  s  knowledge 
in  the  supervised  learning  mode  includes  a-prion  plant  sturciural  knowledge  which  may  be  employed  in 
the  design  of  exploratory  schedules  during  learning  that  results  in  an  unsupervised  learning  scheme.  We 
further  demonstrate  that  neurocontrollers  may  realize  both  linear  and  nonlinear  control  laws  that  arc  given 
explicitly  in  an  automated  teacher  or  implicitly  through  a  human  operator  and  that  their  robustness  may 
be  superior  to  that  of  a  model  based  controller  Examples  of  both  learning  schemes  are  provided  in  the 
adaptive  control  of  robot  manipulators  and  a  cart-pole  system 

Key  words  Nonlinear  control,  neurocomputt  ig.  global  linearization,  supervised  learning,  unsupervised 
learning,  robot  control,  manual  tracking. 


1.  Introduction 

The  recent  revival  in  neuroengineering  research,  which  started  in  1982.  has  focused 
mainly  in  the  field  of  pattern  recognition  and  signal  processing.  Only  a  few  efforts 
have  dealt  with  applications  to  control  engineering. 

The  massive  parallelism,  natural  fault  tolerance  and  implicit  programming  of 
neural  network  computing  architectures  suggest  that  they  may  be  good  candidates 
for  implementing  real-time  adaptive  controllers  for  large-scale  nonlinear  dynamic 
systems. 

Several  neural  network  models  and  neural  learning  schemes  were  applied  to  system 
controller  design  during  the  last  three  decades.  Widrow  and  Smith  (1964)  utilized  the 
ADA  LINE  (Adaptive  Linear  Neuron)  and  M  ADA  LINE  (Many  A  DA  LINES)  archi¬ 
tectures  and  the'Widrow-Hoff  (LMS)  algorithm  to  provide  a  ‘bang-bang’  (control 
where  the  applied  force  is  >'onstant  in  the  +  or  -  direction)  type  of  control  through 
duplication  of  a  known  switching  surface,  for  the  linearized  dynamics  of  a  cart-pole 
system  Bario.  Anderson  and  Sutton  (1983)  developed  the  ASE  (Adaptive  Search 
Element)  to  perform  unsupervised  learning  and  ‘bang-bang’  control  of  a  cart-pole 
system  The  dv  names  were  assumed  to  be  totally  unknown.  Psaltis.  Sideris  and 
Yamamura  ( i9S7)  proposed  using  the  Back  Error  Propagation  algorithm  (BEP)  (see 
Rumelhart  ft  al  (1986))  and  'specialized  learning'  to  reduce  the  total  error  of  the 
controlled  process  in  specific  regions  of  interest  Kawato.  Furukawa  and  Suzuki 
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(1987)  used  the  LMS  algorithm  and  a  linear  combination  of  specific  nonlinear 
subsystems  to  perform  ‘feedback  learning'  of  robot  control.  Guez.  Eilbert  and  Kam 
(1988)  proposed  a  ‘neural  estimator'  with  a  steady-state  topology  that  corresponds  to 
optimal  control  laws.  Guez  and  Selinsky  (1988)  applied  BEP  in  the  design  of  trainable 
adaptive  controllers  (TACs). 

In  all  of  the  works  cited  above,  the  learning  schemes  were  totally  independent  from 
the  control  task.  We  believe  that  some,  however  small,  a-pnori  knowledge  of  the  plant 
dynamics  is  always  available.  We  expect  that  utilizing  the  a-priori  knowledge  in  the 
learning  phase  will  improve  the  neurocontroller  performance. 

In  this  paper,  we  report  preliminary  results  in  neurocontroller  design  which  we 
found  encouraging.  In  Section  2.  we  describe  how  a-priori  structural  knowledge  of 
robot  dynamics  may  be  employed  in  an  unsupervised  learning  routine  for  the  auto¬ 
mated  design  of  a  nonlinear  robot  controller.  In  Section  3,  we  employ  supervised 
learning  with  various  teaching  modes  and  teacher  models,  and  compare  the  neurocon¬ 
troller  robustness  to  that  of  a  model  based  controller.  Section  4  provides  a  discussion 
and  conclusion. 


2.  Unsupervised  Learning  of  Control 

In  supervised  learning,  we  assume  the  existence  of  some  teacher  that  is  capable  of 
demonstrating  the  required  control  performance.  That  is.  in  order  to  train  a  neuro¬ 
controller  to  control  some  dynamic  system,  we  must  have  a  controller  already  capable 
of  controlling  the  system  (see  Psaltis  et  al;  1987;  Widrow,  1987;  Guez  and  Selinsky. 
1988).  This  is  not  a  detriment  in  the  case  of  a  human  trained  controller  as  it  can  be 
used  to  automate  a  previously  human  controlled  process.  In  the  case  of  automated 
linear  and  nonlinear  teachers,  the  dynamics  of  the  controlled  system  must  be  known 
a-priori  for  the  design  of  the  teacher.  We  now  present  the  preliminary  results  of  a 
neurocontroller  learning  to  control  a  dynamic  system  where  only  a  general  knowledge 
of  ihe  dynamic's  structure  is  known  and  no  external  teacher  is  available.  The  system 
under  consideration  is  a  robot  manipulator.  This  problem  is  of  particular  interest  as 
the  full  dynamics  of  a  robot  are  rarely  known.  Furthermore,  the  advent  of  new  direct 
drive  robots  requires  a  controller  capable  of  compensating  for  all  of  the  nonlinearities 
of  the  robot. 


2  I  ROBOT  DYNAMICS 

For  a  robotic  manipulator  consisting  of  an  open  kinematic  chain  of  rigid  links  the 
dynamics  can  be  described  by 


/(</,  /)£/(/)  +  H(q .  q,  t)  +  Bq(t)  +  G(q,  t)  =  T{t). 


(1) 
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where  q(t)  =  the  n  x  1  vector  of  joint  linear  or  angular  positions. 

q(t)  =  the  n  x  1  vector  of  joint  linear  or  angular  velocities. 

q(t)  =  the  n  x  1  vector  of  joint  linear  or  angular  accelerations. 

I(q,  t)  =  the  n  x  n  matrix  of  terms  related  to  inertial  forces. 

H(q,  q,  t)  =  the  n  x  1  vector  of  terms  related  to  centripetal  and  coriolis  forces. 

B  =  the  n  x  n  diagonal  matrix  of  viscous  friction  terms. 

G(q ,  t)  =  the  n  x  1  vectc:  of  terms  related  to  gravitational  forces. 

T(t)  =  the  n  x  1  vector  of  driving  forces  or  torques,  and 


n  =  the  number  of  degress  of  freedom  (DOF)  of  the  manipulator. 


More  details  on  the  derivation  of  the  dynamics  can  be  found  in  Paul  (1981).  The 
dynamics  of  the  robot  are  highly  nonlinear,  coupled,  time  varying  and  configuration 
dependent.  Conventional  control  of  robots  has  relied  on  PID  control  of  each  joint 
individually.  These  controllers  have  been  relatively  effective  because  the  slow  speed 
and  high  gear  reduction  of  the  robots  reduce  the  nonlinearities  introduced  by  the 
inertial,  centripetal  and  Coriolis  forces.  However,  for  high  speed  operation  and 
accurate  trajectory  following,  controllers,  which  compensate  for  the  nonlinearities  of 
the  robot's  dynamics  are  required. 


2.2.  FEEDBACK  LINEARIZED  AND  DECOUPLED  CONTROL  FOR  A  MANIPULATOR 

In  feedback  linearization  or  the  computed  torque  method  (see  Guez.  1982;  Fu. 
Gonzalez  and  Li.  1987),  nonlinear  feedback  described  by  a  Feedback  Linearizing  and 
Decoupling  Transform  (FLDT)  is  used  to  ‘cancel’  the  nonlinearities  of  the  system  and 
transform  it  to  a  linear  controllable  one  as  shown  in  Figure  I.  The  dynamics  of  the 
resulting  system  appear  linear  and  may  be  controlled  using  linear  control  theory.  A 
major  advantage  of  this  type  of  control  is  that  the  poles  of  the  transformed  system 
may  be  arbitrarily  assigned  using  state  feedback. 


l  ie  I  Block  diagram  of  transformed  system 
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The  FLDT  for  an  n  degree  of  freedom  (DOF)  manipulator  can  be  formed  as 
follows.  The  dynamic  equations  (1)  may  be  rewritten  in  the  form 

q(t)  =  /(?,  t)-'[T(t)  -  <?,  0  -  G(q .  ,)),  (2) 

where 

W(q,  q,  t)  =  H(q.  q,  t)  +  Bq(t).  (3) 

The  control  law  is 

T(t)  =  W(q,  q ,  t )  +  G(q,  t)  +  l(q ,  t){kpe  +  kve  +  &(/)).  (4) 

where  e  =  qu(t)  -  q(t). 

Here,  qd(t)  is  the  position  reference  signal  and  k„,  kv  are  appropriately  chosen  for  pole 
placement  of  the  system,  the  transformed  system  is  then  equivalent  to  one  described 
by 

ij(t)  =  kpe  +  kve  +  qd{t)  (5) 

which,  when  kp  and  kv  are  appropriately  chosen,  results  in  an  asymptotically  stable 
system. 

:  3  UNSUPERVISED  LEARNING  OF  ROBOT  MANIPULATOR  DYNAMICS  USING 

HIERARCHICAL  NN 

We  present  a  method  for  the  control  of  robot  manipulators  based  only  on  a-priori 
knowledge  of  the  number  of  degrees  of  freedom  and  the  assumption  that  the 
manipulator  is  an  open  serial  linkage  of  rigid  links.  We  employ  a  hierarchical  neural 
network  computing  architecture  for  the  automated  design  of  an  adaptive  controller. 
A  closer  examination  of  the  manipulators'  dynamics  reveals  that  they  can  be  rep¬ 
resented  by  a  weighted  linear  combination  of  suitable  functions.  Note  that  the 
functions  for  a  general  class  of  manipulators  are  a-priori  known.  We  propose  that 
the  nonlinear  functions  may  be  trained  into  a  series  of  three-layer  feedforward 
network  modules  prior  to  use  and  then  combined  online  in  a  fourth  layer.  Thus, 
information  available  prior  to  use  is  utilized  in  one  level  of  the  hierarchy,  while 
information  that  can  only  be  obtained  online  is  learned  in  another  level.  The  general 
architecture  is  shown  in  Fig.  2(a),  (b) 

We  will  use  a  linear  control  law  with  added  nonlinear  compensation  to  control  the 
manipulator.  During  initial  purely  learning  phase  and  whenever  the  manipulator  is 
not  needed  for  production,  we  will  generate  appropriate  exploratory  schedules  to 
isolate  and  identify  the  nonlinear  compensation  terms.  During  the  purely  learning 
phase,  the  desired  traiectorv  for  the  manipulator  is  provided  by  the  exploratory 
schedule  planner  <HSP).  In  the  learning-production  phase,  the  path  planner  ( PP) 
produces  the  trajectory  for  the  production  task  and  the  nonlinear  compensation  terms 
are  learned  during  suitable  parts  of  the  trajectory. 
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(a) 


(b) 


Fig.  2.  (a)  Unsupervised  learning  architecture.  Functional  block  diagram  and  (b)  neurocontroller 
architecture. 

In  either  the  purely  learning  or  learning-production  phase,  a  simple  expert  system 
is  used  to  identify  the  appropriate  conditions  for  learning  the  nonlinear  compensation 
terms  as  follows.  Let  G0(q),  (V0(q,  q)  and  /„(</)  be  the  outputs  of  the  G(q).  W(q.  q)  and 
I(q)  compensation  networks,  respectively.  After  receiving  the  desired  trajectory,  the 
control  that  is  to  be  applied  to  the  robot  is  calculated  using  the  feedback  c  ; 
from  the  linear  controller  and  the  control  from  the  learned  nonlinear  comper-.” 
terms.  The  network  first  learns  the  G(q)  compensation  terms.  Then  the 
compensation  can  be  learned. 

Learning  of  the  G(q)  terms  is  accomplished  during  positioning  control  phases  of  the 
trajectory.  At  the  steady  state  of  positioning  control,  q  -  q  -  0.  Then  from  ( I )  it  can 
be  seen  that 

T  =  G(q).  (6) 

The  applied  torque  at  steady  state  can  be  used  to  learn  the  G(q)  compensation 
network.  Learning  of  the  W(q,  q)  compensation  terms  is  performed  during  the 
velocity  control  phases  of  the  trajectory.  To  isolate  the  W(q.  q)  terms,  notice  that  at 
q  =  constant,  q  =  0  and 

T  =  W(q .  q)  +  G(q). 


(7) 
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But  G(q)  has  already  been  identified  and  is  available  as  G0(q ),  so  that  we  may  isolate 
lV(q,  q)  to  be  used  in  learning.  The  inertia  related  terms  I(q)  may  be  continuously 
evaluated  using  the  a-priori  known  relationship  (see  Slotine  and  Li.  1987) 

d/(<7,  t)/dt  =  Hm(q,  q)  +  Hm(q,  q)T ,  (8) 

where  H„(q,  q)  is  an  n  x  n  matrix  such  that  in  ( 1 )  H(q.  q)  =  Hm(q.  q)q. 

After  sufficient  learning  time,  the  dynamic  equations  of  the  manipulator  are  known. 
A  controller  of  the  form 

T  =  I0{kpe  +  kve  +  &)  +  W0(q,  q)  +  G0(q)  (9) 


Fig.  3.  Flowchart  of  unsupervised  learning  algorithm. 
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may  then  be  employed  and  the  resulting  system  is  equivalent  to 

q  =  kpe  +  kve  +  qj.  (10) 

Note  that  this  process  does  not  resuppose  the  existence  of  an  omnipotent  controller 
that  is  capable  of  accurately  tracking  a  trajectory,  but  rather  it  starts  out  with  a  simple 
controller  capable  of  performing  a  limited  task  and  uses  learned  knowledge  of  the 
manipulators  dynamics  to  build  a  controller  capable  of  controlling  the  manipulator 
at  high  speeds  over  the  entire  work  space.  The  unsupervised  learning  algorithm  can 
be  summarized  as  shown  in  Figure  3. 

2.4.  SIMULATION  RESULTS 

The  unsupervised  learning  algorithm  is  tested  via  a  simulated  two  DOF  manipulator. 
The  algorithm  used  for  these  results  differs  from  the  one  presented  in  the  discussion 
in  that  the  external  path  planner  for  the  learning-production  phase  was  not 
implemented.  Only  the  exploratory  schedule  planner  for  the  purely  learning  phase  was 
used.  Also,  in  this  example  we  assume  the  presence  of  a  suitable  joint  acceleration 
sensor  so  that  we  may  identify  the  inertial  compensation  terms.  The  two  DOF  planar 
manipulator  shown  in  Fig.  4,  consists  of  two  links  of  length  dt ,  d \  and  mass  m, ,  nu . 
The  mass  of  each  link  is  assumed  to  be  concentrated  at  the  end  of  the  link. 


The  dynamic  equations  for  this  manipulator  are 

=  fn£^i  +  I\i£5i  +  H{(20{^2  +  (Zf;)  +  b\  +  G|,  (11) 

T2  —  +  A:02  -  H\0\  4*  b2  +  G2  (12) 

where  /„  =  (m,  +•  m2)d}  +  m2d;  +  Im.d^d,  cos  (0,),  (13) 

7,2  =  m2[di  +  d{d,  cos  (02)],  (14) 

I::  =  m2d;,  (15) 

=  -  nudxd2  sin  (0:),  (16) 

G,  =  (m,  +  m:)gdt  sin  (0,)  +  m:gd:  sin  (0,  +  0,),  (17) 

G,  =  m2gd:  sin  (0,  +  02),  (18) 


l;ig.  4.  Two  degrees  of  freedom  planar  manipulator. 
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which  may  be  rewritten  to  provide  general  forms  of  linearly  separable  nonlinear 
subsystems  as 


0|,0,  + 

01:  cos  (0j)#,  + 

0|}02  + 

054 

COS  (0:)0:  + 

+  IK,5  sin 

(02)0,02  +  0,6 

sin  (0,)0 

i  + 

0i:0i  + 

+  01s  sin 

(0,)  +  W„  sin  (0,  +  0;). 

(19) 

0':i#i  + 

IK,  cos  (0,)0,  + 

02302  + 

0:j 

sin  (0,)#i  + 

+  023  02 

+  IK«  sin  (0,  +  0,). 

(20) 

Note  that  for  convenience  we  are  employing  the  exactly  computed  values  of  the 
functions  rather  than  network  modules  which  have  learned  the  functions.  In  a 
similar  way  to  that  used  by  Kawato  et  al.  (1987),  we  employ  a  network  consisting  of 
a  linear  combiner  of  the  subsystems.  However,  in  contrast  to  Kawato  et  al.  (1987), 
we  use  a-priori  knowledge  of  the  manipulator’s  dynamics  to  excite  only  specific 
subsystems  of  the  dynamics.  Isolating  a  subset  of  the  manipulator’s  dynamics 
decreases  the  dimension  of  the  weight  space  to  be  searched,  which  reduces  learning 
time  and  increases  the  likelihood  of  a  convergence  of  the  weight  vector  to  the 
optimum.  We  have  found  this  to  be  supported  by  our  simulations. 

The  optimal  values  for  the  weights  as  calculated  from  the  dynamic  equations  using 
the  values 


tn, 

=  nu  =  10.0  kg, 

d, 

=  (/,  = 

1.0  m, 

bt  =  i 

f>,  =  5.0  kg/s. 

0„ 

=  30.0. 

0|2 

= 

20.0. 

0,3 

=  10.0. 

0,4  = 

10.0. 

0,5 

=  -20.0, 

0,6 

= 

-  10.0. 

0,7 

=  5.0, 

= 

196.2. 

0„ 

=  98.1. 

05. 

= 

10.0. 

I K, 

*  10.0, 

023  » 

10.0. 

=  10.0. 

025 

= 

5.0, 

026 

=  98.1. 

The  final  weights  arrived  at  by  application  of  the  learning  algorithm  are 


IK,,  =  29.51. 

0*12  = 

19.28. 

IKj,  =  9.79. 

0|4  =  9.47, 

IK, 5  =  -19.28, 

0,6  = 

-9.97, 

IKi7  =  4.61. 

I V„  =  196.19. 

0,„  =  98.10, 

021  = 

10.06. 

IK,  =  10.20, 

IK,  =  10.12. 

IK4  =  10.22, 

II 

5.01 

IK6  =  98.0. 

In  total.  240  390  iterations  of  the  LMS  algorithm  were  used  for  taining.  In  determin¬ 
ing  G(q).  300  presentations  of  a  115  example  training  set  was  needed.  Identification 
of  IK(<y)  required  171  presentations  of  a  590  example  training  set. 

Figures  5  to  8  shows  the  performance  of  the  controller  as  the  identified  compen¬ 
sation  terms  C(0.  0)  are  added  to  the  linear  controller 

T  =  Kr(0  -  0j)  +  Kv(0  -  &j)  +  C(0.  0). 
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where  Kp,  Kv  are  2  x  2  diagonal  matrices  with  elements  A',,  =  400.0.  A-,,,  =  100.0  and 
A,.,  =  40.0.  A-v,  as  20.0.  respectively.  The  figures  show  the  desired  angular  position 
(0j).  desired  angular  velocity  (0j),  actual  angular  position  (0)  and  actual  angular 
velocity  (0)  of  the  manipulator’s  joints  as  the  controller  attempts  to  track  the  given 
trajectory.  Figure  5  depicts  the  performance  before  any  learning  has  occurred.  The 
compensation  term  C(0.  0)  is  zero.  Figure  6  shows  the  performance  of  the  con¬ 
troller  when  the  gravitational  compensation  terms  have  been  found  and  applied.  The 
compensation  term  is  C(0,  0)  =  (7O(0),  where  <7„(0)  is  the  output  of  the  (7(0) 
compensation  network.  Figure  7  shows  the  performance  of  the  controller  when  the 
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gravitational,  centripetal,  Coriollis  and  viscous  friction  compensation  term  is 
C(0.  0)  =  Wo(0,  0)  +  Go(0),  where  W'o(0,  0)  is  the  output  of  the 
W(0,  0)  compensation  network.  Figure  8  depicts  the  performance  of  the  con¬ 
troller  when  all  compensation  terms  have  been  found  and  applied.  The  controller 
is  now 

T  =  /w(Av(0  -  0(/)  +  kv(&  -  &d))  +  in(0.  &)  +  G(0). 

where  /„(.)  is  the  output  of  the  1(0)  compensation  network  and  kpl .  A',.| .  kr,  and  kl2 
are  as  previously  defined. 
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3.  Supervised  Neurocontroller  (SNC) 

We  now  describe  the  architecture  of  a  supervised  neurocontroller.  The  SNC  archi¬ 
tecture.  as  shown  in  Figure  9.  consists  of  a  teacher,  the  trainable  controller  and  the 
controlled  process. 

The  controlled  process  is.  in  general,  a  nonlinear  dynamic  system  with  unknown  or 
partially  known  dynamics.  The  variables  describing  the  state  of  the  system  (i.e. 
position,  velocity,  etc)  are  obtained  through  suitable  sensors  and  are  provided  to  both 
the  teacher  and  trainable  controller.  The  teacher  describes  the  desired  performance  of 
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the  SNC  by  providing  examples  of  the  control  commands  (<0  required  to  successfully 
control  the  process,  Note  that  the  teacher  may  be  another  automated  process  or  a 
human  that  is  capable  of  generating  the  correct  control  commands.  The  trainable 
controller  is  a  neural  network  architecture  suitable  for  supervised  learning,  prior  to 
training,  as  the  teacher  controls  the  process,  the  control  signals  as  well  as  the  state  of 
the  process  are  sampled  and  stored  for  use  in  training  the  network.  During  training, 
the  stored  process  state  samples  are  used  as  the  input  to  the  network  and  the 
corresponding  stored  control  signal  is  used  as  the  desired  output  of  the  network.  After 
successful  completion  of  training,  the  network  has  discovered  the  basis  for  the 
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Fig.  9.  SNC’s  architecture. 


mapping  from  the  input  to  the  output  and  will  respond  with  the  correct  control 
command.  The  teacher  is  then  removed  and  the  network  controls  the  process. 

3.1.  EXAMPLE:  CONTROL  OF  A  CART-POLE  SYSTEM 

We  have  tested  the  SNC  architecture  on  a  simulated  four-dimensional  nonlinear 
dynamic  system.  The  system  chosen  is  the  cart-pole  system  (or  broom  balancer)  which 
has  been  the  subject  of  previous  research  by  Widrow  and  Smith  (1964),  Widrow 
(1987)  and  Barto,  Anderson  and  Sutton  (1983).  In  contrast  to  the  previous  work,  we 
provide  continuous  rather  than  binary  outputs  (see  Guez  and  Selinsky  (1988)  for 
additional  details  and  experiments  with  the  cart-pole  system).  The  system,  as  shown 
in  Figure  10,  consists  of  an  inverted  pendulum  of  length  2 L  and  mass  m  mounted  on 
a  cart  of  mass  M. 

The  system  operates  in  the  vertical  plane  and  is  controlled  by  the  force  u.  Assuming 
the  pole  is  a  narrow  rod  of  uniform  composition,  no  actuator  dynamics  or  friction  at 
the  pivot,  the  dynamics  of  the  system  are  described  by 

0  =  -jj-  (g  sin  0  -  X  cos  0),  (21) 

y  =  "’(£  sin  002  —  sin  20)  -  fX  +  u 
M  +  w(l  -  $  cos:  0) 

The  state  variables  of  the  system  are  the  cart’s  position,  the  cart’s  velocity,  the  pole’s 
angular  position  and  the  pole's  angular  velocity,  respectively 

Z  =  [Z,  X,  0,  0]r.  (23) 

We  choose  a  feedforward  network  with  two  hidden  layers  for  the  controller.  The 
network  contains  4  neurons  in  the  input  layer.  16  in  the  first  hidden  layer.  4  in  the 
•,econd  hidden  layer  and  1  output.  The  activation  of  the  input  layer  neurons  are  linear, 
all  others  use  a  sigmoidal  activation  function. 
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Fig.  10.  Cart-pole  system. 


The  state  variables  of  the  cart-pole  system  were  used  as  inputs  to  the  network.  The 
output  of  the  network  was  scaled  to  reflect  a  continuous  value  of  from  +  1  to  -  1 
which  is  designated  to  be  the  normalized  control  signal  un.  The  normalized  control 
signal  represents  the  signed  fraction  of  the  total  force  Ft  (where  Fx  is  constant)  that 
is  to  be  applied  to  the  carts  as  the  control  signal  u ,  in  other  words  u  =  We  note 
that  the  inputs  and  outputs  are  continuously  valued,  not  binary.  Only  the  dynamic 
range  of  the  output  has  been  limited,  any  real  value  was  allowed  for  the  inputs. 

The  training  data  consists  of  periodic  samples  of  the  cart-pole  system’s  state  and 
the  normalized  control  signal  from  the  teacher.  The  training  samples  are  recorded  as 
the  teacher  returns  the  cart-pole  system  to  the  origin  of  the  state  space  from  some 
nonzero  initial  state.  Training  data  is  typically  recorded  over  a  1  to  2  minute  period 
of  operation.  The  network  is  trained  offline  using  the  Back  Error  Propagation  (BEP) 
algorithm  (see  Rumelhart,  Hinton  and  Williams  (1986))  and  the  recorded  training 
data. 

3.2.  SUPERVISED  LEARNING  WITH  A  LINEAR  TEACHER 

An  automated  teacher  consisting  of  a  linear  control  law  is  first  used  to  train  the  SNC. 
Although  a  linear  combiner  could  have  been  used  to  duplicate  the  control  law.  we 
have  included  this  example  to  show  that  a  multilayer  network  can  be  trained  using  a 
wide  range  of  control  laws.  The  cart-pole  system  (Equations  (21)  and  (22))  were 
linearized  about  0  =  0  (see  Kwakernaak  and  Sivan.  1972).  The  linearized  dynamics 
used  to  formulate  the  control  law  are 

•V  =  i(«-/.V).  (24) 

.V/ 

0  =  A  (g0  -  *)■ 


(25) 
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state-control  space.  Thus  we  conclude  that  a  multilayer  feedforward  network 
can  be  used  to  generate  linear  control  in  a  local  neighborhood  of  the  state  space 
origin. 

3.3.  SUPERVISED  LEARNING  WITH  A  NONLINEAR  TEACHER 

The  linear  teacher  used  in  the  previous  section  was  derived  by  linearizing  the  cart-pole 
system's  dynamics  about  0  =  0,  thus  it  is  only  applicable  around  the  state  space 
origin.  An  automated  teacher  that  is  stabilizing  in  the  entire  state  space  for  the 
nonlinear  system  Equations  (21)  and  (22),  can  be  obtained  by  the  use  of  a  Feedback 
Linearizing  and  Decoupling  Transform  (FLDT)  as  used  previously  in  Section  2. 
Rewrite  the  system  Equations  (21)  and  (22)  in  the  form 


0  =  ht  —  luX% 

(27) 

f  +  u 

X  ~  A  ’ 

(28) 

where 

/ii  =  ■—g  sin  0, 

(29) 

h,  =  —  cos  0,  (30) 

4  L 

f  =  m(L  sin  002  -  ft  sin  20)  -  fX,  (31) 

/,  =  M  +  m(l  -  icos20).  (32) 

Then  the  FLDT  control  law  that  is  to  be  used  as  the  automated  teacher  is 

«  =  r  ^  +  *i<0  -  0j)  +  *20  +  c,(X-  Xj)  +  c:X)  -  f,  (33) 

with  kt  =  25.  k2  =  10,  c,  =  1  and  c2  =  26. 

Examples  generated  by  the  above  FLDT  control  law  (17)  were  used  to  train 
a  SNC.  Figure  12  compares  the  performance  of  the  FLDT  teacher  and  its 
corresponding  SNC.  The  SNC  used  was  trained  for  80000  iterations  of  the  BEP 
algorithm. 

Although  the  nonlinear  teacher  and  corresponding  trained  SNC  do  not  follow 
trajectories  that  match  as  closely  as  in  the  case  of  the  linear  teacher,  they  both  exhibit 
the  same  characteristic  transient  and  steady  state  behavior.  Subsequent  tests  also 
show  similar  performance.  We  conclude  that  the  SNC  was  able  to  learn  and  generate 
stable  control  from  the  nonlinear  teacher. 
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Fig.  12.  Comparison  of  nonlinear  teacher  and  corresponding  SNC. 


3.4.  SUPERVISED  LEARNING  WITH  A  HUMAN  AS  THE  TEACHER 

In  the  previous  sections,  the  exact  form  of  the  control  law  that  was  used  as  the  teacher 
was  a-priori  known.  In  this  section,  we  used  examples  generated  by  the  responses  of 
a  human.  In  training  with  a  human  teacher,  the  exact  form  of  the  control  law  is  not 
known  as  it  has  been  acquired  by  the  teacher  though  direct  interaction  with  the  system 
rather  than  through  formal  analysis  of  the  system’s  dynamics.  To  generate  the 
training  data  the  teacher  observes  the  cart-pole  system  on  the  computer  screen  and 
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returns  the  system  to  the  state  space  origin  by  application  of  the  control  signal  via  an 
input  device.  The  results  reported  are  for  Human  Trained  Supervised  Neurocontrollers 
(HSNC)  that  have  been  trained  for  40000  iterations  of  the  BEP  algorithm.  Figure  13 
depicts  the  angle  of  the  pendulum  (0  in  radians),  the  position  of  the  cart  (X  in  meters) 
and  the  normalized  control  signal  ( u„ )  as  a  trained  HSNC  stabilizes  the  system  from 
the  initial  state  Z  =  {2,  0,  -0.4.  0)r. 

Figure  1 3  demonstrates  that  the  HSNC  was  able  to  learn  to  generate  stable  control 
from  the  examples  of  the  humans  response. 


O  5  10  15  20  25  30 


tfm*  (j»c) 


Fig.  13.  Results  of  training  HSNC 
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3.5.  SUPERVISED  LEARNING  WITH  A  HUMAN  AS  THE  TEACHER  USING  FILTERED 
DATA 

When  training  a  network  using  examples  that  have  been  generated  by  a  human 
teacher,  we  have  to  take  into  account  that  the  teacher  performs  some  processing  of 
.the  information  before  applying  the  control  signal.  State  information  is  obtained  by 
the  teacher  through  the  image  on  the  computer  screen,  then  the  control  signal  that  is 
applied  is  shaped  not  only  by  the  teachers  intention,  but  also  by  physiological 
characteristics.  Characteristics  such  as  response  delay  (reaction  time)  and  hand 
tremor  can  be  a  limiting  factor  in  a  human’s  ability  to  perform  manual  tracking  tasks 
(see  Guez  and  Boykin,  1982).  In  our  studies,  the  response  delay  has  been  shown  to 
play  a  significant  role  in  the  ability  of  a  network  to  learn  from  a  human.  To  correct 
this  problem  we  have  filtered  the  training  data  to  account  for  the  simplest  model  of 
the  human  transfer  function 

GO’Uan  =  e~;“’7'h“n’*\  (34) 

which  is  a  time  delay.  The  filtering  consists  of  shifting  the  sampled  normalized  control 
signal  by  the  estimated  reaction  time  of  the  teacher.  In  our  experiments,  the  samples 
are  recorded  every  50  ms.  The  z'th  sample  of  the  state  variables  was  paired  with  the 
(/  +  2)th  sample  of  the  normalized  control  signal,  to  produce  a  total  shift  of  100  ms. 
Figure  14  shows  the  performance  of  an  HSNC  that  was  trained  using  the  same  data 
as  in  the  previous  section,  only  the  data  was  filtered  as  described  above.  The  initial 
state  is  the  same  as  in  Figure  13. 

By  comparing  Figures  13  and  14,  we  see  that  the  performance  of  the  network  that 
was  trained  using  the  filtered  data  is  much  improved. 

Furthermore,  as  can  be  seen  in  Figure  1 5,  the  performance  of  the  network  surpasses 
that  of  the  teacher.  This  is  reasonable  in  that  a  teacher  with  an  inherent  time  delay 
would  have  to  have  perfect  knowledge  of  the  cart’s  dynamics  to  compensate  for  the 
delay.  By.  shifting  the  data,  we  allow  the  network  to  learn  the  intent  of  the  teacher 
rather  than  what  physiological  limitations  allow.  Also,  note  that  the  network  was  able 
to  learn  the  control  from  very  noisy  training  data  as  would  be  expected  in  a  realistic 
situation. 

In  these  experiments,  the  object  is  to  return  the  cart-pole  system  to  the  origin  of  the 
•state  space.  The  controller  trained  with  the  human  teacher  and  the  filtered  data  has 
learned  to  stabilize  the  system,  but  it  has  not  learned  to  return  the  cart  to  the  origin 
(X  =  0).  We  attribute  this  to  the  examples  not  providing  a  representative  of  all  aspects 
of  the  control  problem.  This  problem  was  manifested  only  in  the  controller  trained 
with  the  filtered  human  examples.  To  correct  this  problem,  a  -0.33  meter  bias  has 
been  added  to  the  cart  position  input  of  the  HSNC's  reported  in  the  rest  of  this  article. 


Comparison  of  HSNC  and  FLDT  Controllers 

We  now  compare  the  performance  of  the  FLDT  teacher  and  the  human  trained 
controller.  Note  that  in  this  comparison  the  FLDT  control  law  was  derived  using  the 
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Fig.  14.  Results  of  training  HSNC  using  tillered  data. 


full  knowledge  of  the  cart-pole  system’s  dynamics.  The  HSNC.  on  the  other  hand  was 
trained  using  only  the  knowledge  that  the  human  teacher  acquired  through  inter¬ 
action  with  the  cart-pole  system.  Figures  16  to  19  compare  the  performance  of  the 
HSNC  and  FLDT  controllers  for  various  initial  states. 

The  performance  of  the  HSNC  is  comparable  to  FLDT  when  the  region  of 
operation  is  within  the  region  that  the  training  examples  represented.  Note  that 
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time  (see) 

Fig.  15.  Training  data  provided  by  the  human  teacher. 

even  though  the  performance  is  degraded  in  Figure  19  the  network  shows  some 
generalization  of  its  learning  and  is  able  to  stabilize  the  system. 

Comparison  of  Robustness  of  HSNC  and  FLDT 

We  compare  the  sensitivity  of  the  HSNC  and  FLDT  controllers  to  parameter  vari¬ 
ations.  Figures  20  to  24  shows  the  angle  of  the  pendulum  as  each  controller  returns 
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the  cart-pole  system  to  the  state  space  origin  from  the  initial  state  Z  =  ( 1  5, 0.  -  0.5. 
0)r.  Each  figure  corresponds  to  one  of  the  parameters  being  varied. 

From  Figures  20  to  24.  we  see  that  for  variations  in  cart  mass,  pole  mass,  force  gain 
and  friction  the  HSNC  is  more  robust  than  the  model  based  FLDT  controller. 

4.  Conclusions 

We  have  outlined  a  method  whereby  the  use  of  a-priori  knowledge  has  resulted  in  a 
previously  supervised  learning  algorithm  to  be  made  into  an  unsupervised  one.  The 
use  of  a  hierarchical  neural  network,  one  where  there  are  three  layer  network  modules 
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Fig.  17.  Comparison  of  performance  of  HSNC  and  FLDT.  Z  =  (-  1.5.  0.  0.3.  0)r 


which  are  combined  in  a  fourth  layer  and  which  allows  selective  attention  to  learn¬ 
ing.  has  the  potential  to  provide  natural  fault  tolerance,  robustness,  flexibility 
of  programming  and  high  speed  operation  when  implemented  n.  nardware.  We 
•  note  that  there  will  be  a  tradeoff  between  robustness  and  accuracy  when  using 
the  network  modules  as  functions  rather  than  the  exactly  calculated  functions. 
The  networks  can  learn  to  approximate  the  functions  but  will  not  reproduce  them 
exactly. 

We  have  also  shown  that  a  neural  network  can  be  taught,  using  a  supervised 
learning  paradigm,  to  generate  stable  control  for  a  nonlinear  dynamic  system.  The 
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Fig.  18.  Comparison  of  performance  of  HSNC  and  FLDT.  Z  =  (-1.5.0.  -0.5.0)'. 


results  of  training  with  a  human  teacher  demonstrate  that  stable  control  can  be 
learned  even  when  the  exact  form  of  the  control  law  is  not  known.  Furthermore,  the 
human  trained  adaptive  controller  was  shown  to  be  more  robust  with  respect  to 
parameter  variations  than  a  model  based  controller. 

We  conclude  from  these  results  that  neural  networks  using  both  supervised  and 
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Fig.  19.  Comparison  of  performance  of  HSNC  and  FLDT.  Z  =  (1.0.  -0.65.  0)'. 


unsupervised  learning  paradigms  can  be  a  basis  for  robust,  realtime,  adaptive  con¬ 
trollers.  Moreover,  we  conclude  that  a-priori  knowlege  of  the  controlled  process  and 
control  environment,  no  matter  how  little,  can  contribute  to  the  structural  design  of 
neurocontrollers.  Continuing  research  is  directed  towards  issues  of  neurocontroller 
stability.  ES  design,  and  suitable  network  architectures  (see  Selinsky.  1989). 
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Fig.  24.  Comparison  of  sensitivities  of  HSNC  and  FLDT  to  lengr.i  of  the  pole. 
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TWO  DEGREE  OF  FREEDOM  NEUROCONTROLLER 
Allon  Guez  and  izhak  Bar-Kana 

ECE  Dept.,  Drexei  University,  Philadelphia,  PA  19104 
Summary 

In  this  paper  we  combine  the  simple  technique  for  adaptive  robot  control  proposed  by 
[Sar-Kana  and  Guez  1989]  and  the  exploratory  learning  schedule  proposed  by  [Selinsky  and 
Guez  1989]  into  a  unified  two  degrees  of  freedom  (DOF)  robot  controller,  thereby  improving 
the  performance  of  both  approaches.  Both  approaches  are  first  summmarized  then  the  2  DOF 
controller  is  presented  and  the  motivation  for  its  design  is  explained. 
Keywords:Neurocontrol;  Learning;  Tracking. 

In  [Bar  kana  and  Guez  1989]  an  unsupervised  distributed  parallel  computing  architecture  is 
proposed  for  the  adaptive  control  of  nonlinear  dynamic  systems  of  the  class 


x  (t)  =  A(x)x(t)  +  B(x)uvi)  i  i  ) 
y(t)  =  C(x)x(t)  +  D(x)u(t)  (2) 

with  some  degree  or  other  of  uncertainty,  where  *0)  Rn  is  the  plant  state  vector,  y(t)  Rm  is 

the  output  vector,  and  u(t)  Rm  is  the  input  command  vector,  and  where  A(x).  B(x),  0*0,  and 
D(x)  are  uniformly  bounded  matrices  of  corresponding  dimensions. 

The  proposed  controller  consists  of  a  teacher  model  which  incorporates  the  knowledge 
regarding  the  desired  input/output  plant  response  as  well  as  the  repertoire  of  reference 
commands  that  the  system  may  be  subjected  to. 


The  teacher  dynamic  model  is  assumed  to  have  the  following  representation: 

x  t(t)  =  At(xt)xt(t)  +  Bt(xt)ut(t)  (3) 
y,(t)  =  Ct(xt)xt(t)  +  Dt(xt)ut(t)  (4) 

where  xt(t)  Rnt  is  the  state  vector,  yt(t)  Rm  is  the  output  vector,  and  ut(t)  Rm  is  the  input 
command  vector,  and  where  At(xt),  Bt(xt),  Ct(xt),  and  Dt(xt)  are  uniformly  bounded 
matrices  of  corresponding  dimensions.  It  is  emphasized  that  the  dimension  of  the  model  is 
unrestricted,  except  that  dim(yt)  =  dim(y)  =  m. 

The  parallel  distributed  adaptive  controller  receives  the  input  'features'  vector  f(uj, 
y)  and  generates  as  an  output  the  process  control  vector  u,  where 

u(t)  =  K(t)f(ut,  x,f  y)  (5) 

where  K(t)  is  the  adaptive  gain  matrix  of  appropriate  dimension.  Each  Ky  gain  monitors  the 
sensitivity  of  the  i-th  control  loop,  namely  Uj,  to  the  j-th  feature  of  the  system,  namely  fj(U(, 
xj,  y)  and  may  be  viewed  as  the  state  of  the  /;- th  neuron. 

The  Ky  gain  adjusts  its  value  independently  of,  and  simultaneously  with  all  other  gains, 
according  to: 


0 


Kjj(t)  =  Mjj(t)  +  Njj(t)  (6) 

Mjj(l)  =  a jj(y tj  -  yj  )fj  (7) 

Nij(l)  -  •  bijN  i|(t>+  Oij<y,j  -  Vi  )fj  (8) 

where  ajj,  by  and  gy  are  positive  constants. 

Bar-Kana  and  Guez  show  that  perfect  regulation  of  the  plant  may  be  obtained  in  the 
purely  deterministic  case.  Notice  however  that,  while  reducing  the  amount  of  prior  knowledge 
required  to  guarantee  stability  of  the  adaptive  systems,  this  simple  adaptive  controller  does 
not  take  into  account  any  valid  knowledge  on  the  specific  plant  to  be  controlled.  No  learning  is 
used  to  improve  performance  in  future  trials  or  to  make  the  control  assignment  easier.  The 
initial  adaptive  gains  are  always  zero,  as  no  better  initial  value  is  available,  and  also  the  range 
of  the  adaptive  gains  may  me  unnecessarily  large.  Indeed  the  main  purpose  of  this  simple 
algorithm  is  first  of  all  to  reduce  the  necessary  condition  to  mere  stabilizability. 

The  system  proposed  by  [Selinsky  and  Guez  1989]  for  robot  manipulator  control 
consists  of  the  Exploratory  Schedule  Generator,  Neurocontroller  and  associated  Learning 
Algorithm,  the  Robot,  and  a  mechanism  which  allows  selecting  between  the  User  /  Path  Planner 
and  ES  Generator  as  the  originator  of  the  desired  trajectory  (q^). 

The  neurocontroller  implements  the  control  law 


ti=  S  YjjCq.q.q^q^gj+Kdy  er. 
j=l 


i  =  1,2, ....  n  ,  (9) 


where  0  denotes  the  estimate  of  the  robot  parameters  0,  and  the  Kd ,j  are  constant 

weights  for  the  servo  portion  of  the  controller  .  Selinsky  and  Guez  indeed  use  learning  in  the 
adaptation  process.  Their  exploratory  schedules  also  guarantee  ...at  along  with  tracking,  the 
unknown  plant  parameters  are  finally  identified.  Yet,  in  order  to  get  satisfactory  results  for 
both  tracking  and  learning,  the  supplementary  gains  are  needed  in  (9).  The  results  of  both 

tracking  and  learning  are  strongly  affected  by  the  right  selection  of  these  parameters  ,  and  in 
practice  they  are  fixed  by  trial  and  error  procedures.  Furthermore,  if  one  selects  high  fixed 
gains  to  get  small  tracking  errors,  one  then  also  gets  high  noise  amplification  and  possibly  high 
cost  of  control  even  when  not  necessarily  needed.  An  adaptive  procedure  for  K^..  like  in  Bar- 

Kana  and  Guez,  will  fix  the  gains  as  a  function  of  the  tracking  errors.  It  thus  results  in  large 

gains  only  if  the  tracking  error  (in  our  case  represented  by  er )  tends  to  increase,  and  decrease 
afterwards,  fitting  thus  the  right  control  gains  to  the  right  situation. 

Instead  (  see  (Bar  Kana  and  Guez  1989-b]  for  details)  we  propose  the  2  DOF  controller 
shown  in  Figure  1. 
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Fig.  1 :  Two  Degrees  of  Freedom  NeuroControlier. 

The  identification  and  linearization  neurocontroller  algorithm  is  given  by  equations  (9), 
and  (10)  and  the  adaptive  servo  neurocontroller  algorithm  has  the  form  of  equations  (5),  (6), 
(7)  and  (8)  Main  Result.  It  can  be  shown  [Bar-Kana  and  Guez  1989-b]  that  global 
asymptotic  tracking  and  identification  as  in  [Selinsky  and  Guez  1989]  still  holds  even  for  the  2 
DOF  controller,  i.  e.,  when  the  servo  gain  is  time  varying.  That  is,  global  asymptotic  tracking  as 
well  as  global  asymptotic  identification  of  up  to  n  robot  parameters  can  be  guaranteed  with  our 
2  DOF  robot  controller  if 


t/^m=  [Rank  (Y(q,  q.qd.  qd))]  =  p. 
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ABSTRACT 

This  paper  describes  the  performance  of  a  two-dcgrcc-of-frccdom 
(2  DOF)  robot  neurocontroller.  The  main  theorem  suggests  that 
global  asymptotic  stability  is  guaranteed  despite  the  highly  nonlinear 
nature  of  the  closed-loop  dynamics.  Examples  demonstrating  the 
applicability  of  the  neurocontroller  arc  provided. 

1.  INTRODUCTION 

The  neurotontroller  proposed  here  is  based  on  a  combination  of 
the  architectures  proposed  in  (1),[  2],  [3],  and  [4]  and  demonstrates 
some  useful  features.  In  section  2  we  describe  the  controller 
architecture.  Section  3  presents  the  main  results.  In  section  4  wc 
provide  a  two-links  rigid  manipulator  example. 

2.  THE  ADAPTIVE  CONTROLLERS 

The  proposed  2  DOF  ncurocontrollcr  is  described  in  Figure  1. 


NS 


Wc  shall  develop  the  ncurocontrollcr  and  describe  its  performance 
for  a  rigid  n  DOF  robot  manipulator.  The  marhcmatical  notations  and 
the  underlying  component  algorithms  follow  closely  the  references 
[1]  and  [3]. 

The  robot  dynamic  model  is  assumed  as  follows: 

D(q)q  +  C(q.q)q  +  G(q)  =  x  (1) 


where 

D(q)  is  an  n  x  n  matrix  of  inertia!  terms, 

C(q,q)  is  an  n  x  n  matrix  of  coriollis  and  centrifugal 

terms , 

G(q)  is  an  n  x  1  vector  of  gravitational  terms, 

q  is  an  n  x  1  vector  of  joint  coordinates, 

x  is  an  n  x  1  vector  of  forccs/torqucs. 

n  is  the  number  of  degrees  of  freedom. 


Equation  (1)  can  also  be  expressed  as 

X  =  D(q)’q  +  C(q,q)  q+  G(q)  =  Y(q,q,  q,  q)0  (2) 

where  Y(q,q,  q,  q)  is  an  nxp  matrix  of  known  functions  and  0  is 
an  pxl  vector  of  weighting  constants  (Slotine  and  Li  (  3|).  When  (2) 
is  written  in  terms  of  the  individual  torques  at  each  joint,  it  can  be 
viewed  as  a  single  layer  linear  network,  where  the  inputs  to  the 

network  are  the  Yjj(.)  and  the  weights  are  the  0;  (Fig.  2).  Notice 
that  the  Yjj(*)  arc  transcendental  algebraic  functions  of  the 
manipulators  states  that  arc  a  priori  known  and  may  be  realized  via 
feedforward  neural  networks  that  arc  trained  offline  with  a  suitable 
learning  algorithm . 


Let  qd,  qd ,  q,  designate  the  desired  trajectory.  We  define 
ejt;  =  q(t)-qd(t)  the  joint  coordinate  error,  and  following  [  3],  wc 

define  the  virtual  reference  trajectory  qr,  qr,  q,  and  the  virtual 

trajectory  velocity  error  s  =  q 

q  =  qj  -  A  c,  q  =  q,  -  A  q  s  =  q  -  q  =  e  +  Ac  (3) 

where  A  is  a  positive  definite  matrix  with  constant  coefficients. 

The  neurocontroller  provides  the  control  command  x  =  xNp+  xNS. 
where  xNS  is  the  output  of  the  servo  neural  net  (NS)  and  xNF  is  the 
output  of  the  forward  neural  net  (NF). 

We  choose  the  architecture  of  NF  to  implement  the  control  law 

tNF=^  Yij(q,q,qr.’4)9j  -  >  =1-2 . *n  (4) 

j  =  ! 

where  G  denotes  the  estimate  of  0,  and  is  implemented  as  a  synaptic 
weight  in  NF  with  the  Slotine  and  Li  [3]  learning  (adaptation)  rule: 
n 

©j  =  >  4r*  4-)  $.t  *  =L2 . >P-  (5) 

i=l 

Notice  in  equation  (5),  the  similarity  to  the  LMS  learning  rule 
(sec  Widrow  and  Stearn  [5])  where  the  weight  change  is  proportional 

to  the  error  e  and  the  input  features  X.  In  equation  (5)  the  input 
features  are  the  Yjj(.)  functions  and  the  error  is  s. . 

Figure  2  shows  the  internal  structure  of  the  neurocontrollcr  for 
joint  i.  Each  feedforward  network  module  is  trained  to  provide  one  of 

the  Yjj(q,q,  qr.'4)  functions. 

The  Y|j(q,q,  qr,qr)  are  implemented  by  neural 
networks  providing  the  opportunity  to  capture  nonrigid 
dynamics  effects  when  their  number  overestimates  n. 


Figure  2:  Neural  implementation  of  NF. 


The  servo  neurocontrollcr  NS  implements  the  Bar-Kana 
adaptative  rule  [1],  [2] 


Kd..(t)*Kdiij(t)  +  Kdp.j(t) 
Ka  ij,(0=  a.ssT-p..KdI..(t) 


(3) 


'V 


^Jpjj(0  =  Yiiss 

tNS  =  ^ds 

Notice  that  without  the  dynamic  NS  and  without  the  neural 
implementation  of  the  NF,  the  neurocontroller  reduces  to  the  adaptive 
controller  of  Slotine  and  Li  [3].  For  constant  servo  feedback 
(constant  Kd)  they  proved  global  stability.  We  shall  extend  their 
result  and  prove  that  global  stability  is  maintained  with  our  adaptive 
servo  (6)-(9),  and  with  added  benefit. 

In  (2)  an  unsupervised  distributed  parallel  architecture  is 
proposed  for  the  adaptive  control  of  nonlinear  dynamic  systems  of 
the  class 

x  (t)  =  A(x)x(t)  +  B(x)u(t)  ( 10) 

y(t)  =  C(x)x(t)  +  D(x)u(t)  (11) 

with  some  degree  or  other  of  uncertainty,  where  x(t)e  £n  is  the  plant 

state  vector,  y(t)e  Rm  is  the  output  vector,  and  u(t)e  Rm  is  the  input 
command  vector,  and  where  A(x),  B(x),  C(x),  and  D(x)  are 
uniformly  bounded  matrices  of  corresponding  dimensions. 

The  controller  proposed  in  (2]  is  shown  in  Figure  3  below:  It 
consists  of  a  teacher  model  which  incorporates  the  knowledge 
regarding  the  desired  input/output  plant  response. 


REFERENCE  CONTROL  OUTPUT 


Figure  3:  Neurocontroller  Proposed  in  [21. 

The  teacher  dynamic  model  is  assumed  to  have  the  following 
representation: 

Xj(t)  =  A,(x,)x,(t)  +  B,(x,)u,(t)  (12) 

y,(0-Cl(xI)*l(t)  +  Dl(x,)u,(t)  (13) 

It  is  emphasized  that  the  dimension  of  the  model  is  unrestricted, 
except  that  dim(yt)  =  dim(y)  =  m. 

The  parallel  distributed  adaptive  controller  (Fig.  4)  receives  the 
input  'features’  vector  f(u,,  x,,  y)  and  generates  as  an  output  the 
process  control  vector  u(t),  where 

u(t)  =  K(t)f(u„  x„  y)  (14) 

where  K(t)  is  the  adaptive  gain  matrix  of  appropriate  dimension. 
Each  K|j  gain  adjusts  its  value  independently  of.  and  simultaneously 
with  all  other  gains,  according  to: 


Proof:  Let  us  select  tne  positive  cetimte  quaoruttc 
function  of  the  parameters  s,  3,  Kdl 

V=|sTD(q)s+  ^  0TktS  +  j  trace  [Kj,(t)a*  *  K  J|(t)]  (18) 

Using  the  fact  that  for  rigid  robots  xT[  D  -  2C]x=0  Vx,  we 
obtain  after  some  manipulations  [7] 

V=  -cdisll2  -  pTrace  [Kdl(t)a',Kj1(t)]  <0Vs/0,  Kdl*0. 


Therefore  lim  llsll  =  0  implying  q(t)  -» qd(t). 
t-*« 


Figure  4  •  Neurocontrollcr  Architecture 
a)  General  Block  Diagram  b)ij-th  Neuron  Architecture. 

4.  ROBOTIC  EXAMPLE 


Kjj(t)  =  Mjj(t)  +  Njj(t) 

(15) 

/ - N 

» 

>c 

n 

s’- 

(16) 

^  Njj(t)  =  -  PijNij(t)+  Yjjfy,.  -  yj  )f. 

(17) 

where  otjj,  Py  and  Yy  are  positive  constants.  Notice  in  equations  (17) 
the  similarity  to  the  LMS  learning  rule  (sec  Widrow  and  Steam  (  5)) 
where  the  weight  change  is  proportional  to  the  error  e  and  the  input 
features  X.  in  equation  (17)  the  input  features  .re  the  f,  functions 

aru.  -he  error  is  yt.  -  y;.  The  fi-term  in  (17)  is  added  to  avoid 
divergence  of  the  integral  adaptive  gains  in  the  presence  of 
disturbances  and  uncertainties  [11. 

3.  MAIN  RESULT 

Theorem:  The  dynamic  system  described  by  equations  (l)-(9) 
and  represented  in  figure  1  provides  global  asymptotic  tracking  for  all 

positive  a,  (3,  y,  and  for  all  twice  differentiable  trajectories  qd(t). 


We  simulated  the  exact  system  as  in  (4)  with  our  new  controller. 
The  equations  of  morion  are: 

Tj  =  Dt|q|  +  D12q2  +  D(q2  +  2i}|q2)  +  D[ 

t2  =  D|2qi  +  D22t)2  •  D  q 2  +  D2 
where 

Lj  =  U,  =  1.0  m,  trtj  =  :n2  =  10  kg,  g=9.81  nt/s2, 

Dn  =  (mj+  m^L/j  +m2L2  +2m2L,L2COsq2. 

D12  =  tn2L2  +  m2L1L2cosq2, 

D22=  m2L2 , 

D  =  -m2L[L2sinq2, 

D  |  =(m,  +  m2)gL,  sinq ,  +m2gL2Sin(q ,  +q2), 

D2  =  m2gL2sin(q,+  q2). 

With  the  above  numerical  values,  the  plant  parameters  have  the 
values  ©i=20,  02=1O,  03=1O,  04=196.2,  05=98  1  The 
performance  of  the  two-degree-of-freedom  robotic  control  system  is 
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Figure  5.  Adaptive  Servo  Gains:  (a)  Acceleration  schedule: 
(b)  Trajectories  q^O),  qi(t);  (c)  Tracking  Errors  ej(i)i  c2(t); 

<d)  Controller  Gains  Kdu(t), 


represented  in  Figure  5  with  identification  coctficicnts  KJ(|=U  001 

and  with  adaptation  coefficients  <*=100000.  ,i-*»  I.  y=HXK).  i :.c 
parameters  adjustment  is  only  as  good  as  the  tracking  system  is 
affected.  We  do  not  introduce  any  persistant  excitation,  that  is  usually 
used  to  guarantee  convergence  of  the  parameters.  However,  at  idle 
times  between  production  times,  the  robot  can  execute  exploratory 
schedules  [4]  in  order  to  bring  the  identified  parameters  as  close  as 
possible  to  the  actual  values.  Afterwards,  the  robot  performs  any 
desired  assignment,  and  then  NF  and  NS  work  in  tandem  to  execute 

it.  Figure  5a  shows  the  acceleration  process  Figure  5b 
represents  the  desired  qdl  and  the  actual  q,.  w.uch  at  this  scaling 
appear  to  be  identical.  During  this  experiment  we  kept  qd;  =  0. 
Figure  5c  represents  the  small  tracking  errors  with  a  more  appropriate 
scaling.  Figure  5d  shows  the  behavior  of  the  adaptive  gains  K,,tw 
and  K22O).  It  can  be  seen  how  the  adaptive  gams  move  up-and-down 
and  maintain  small  tracking  errors.  They  may  even  vanish,  if  they  are 
not  needed  any  more. 


The  performance  of  the  robotic  control  system  with  constant 
•ervo  gains  is  strongly  affected  by  the  selection  of  the  constant 
miitrol  gains  Kd-..  with  identification  coefficients  Kaii=0.001 
and  with  low  gains  Kdu  =  100.,  Kj22  =  50.,  both  the  identification 
and  the  tracking  errors  are  bad  (6).  With  high  constant  gains  = 
1000.,  Kq22  =  500,  the  results  improve  and  arc  very  similar  to  the 
results  presented  in  Fig.  5.  However,  the  servo  gains  remain  always 
high,  even  if  they  are  not  needed. 

We  wanted  the  servo  gains  to  drop  when  they  are  not  needed 
any  more,  because  we  expected  the  controller  with  high  gains  to  be 
strongly  effected  by  disturbances,  even  after  the  plant  parameters  are 
actually  identified.  Figure  6  shows  indeed  that  the  identification  is 
'•-nbly  disturbed^by  noise,  and  tracking  is  bad  when  high  servo 
.-..as  Kdlt  =  1000.,  Kd22  =  ^00  are  used  with  measurement 
disturbances  of  0.1  sin  20t  and-0.2sin30t,  correspondingly.  The 
surprize  came  when  the  adaptive  control  gave  also  similar  (bad) 
results.  This  result  can  be  explained  by  the  deterioration  of  the 
identification  due  to  the  disturbance,  which  introduces  a  noisy  control 
signal  xNFto  the  robot. 


Fig.  6.  Constant  Gains  with  disturbance:  (a)  Tracking  Errors 
e,(t).  c:(t);  (b)  A<&,(t),  A<J>2(t),  A02(t);  (c)  A04(t),  Ad>5(t); 


Therefore,  we  slowed  the  rate  of  identification  to  K3..=0.1 
and  the  results  shown  in  figure  7  with  adaptive  servo  gains  show 
good  identification  and  reasonable  tracking  in  spite  of  the  unknown 
disturbance.  The  adaptive  gains  Kn(t)  and  K::it)  move  up-and- 
down  in  order  to  maintain  small  tracking  errors'.  The  system  thus 
combines  a  slow  long-term  memory  which  is  not  affected  bv 
temporary  disturbances,  and  a  fast  short  ten:: .  aapuve  controller  to 
overcome  transients  and  uncertainties.  If  the  „:.,turbance  vanishes, 
the  adaptive  gains  also  eventually  decrease  or  \  amsh,  and  the  control 
signal  will  be  given  almost  entirely  by  the  eorrcctlv  identified 
parameters. 


5.  CONCLUSION 

A  two-degrees-of- freedom  robot  ncur.  controller  provides 
opportunities  for  noise  filtering  and  dist,.-*-  nice  rejection  not 
available  with  constant  gain  servo  neurocontro.ier 
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ABSTRACT 

This  paper  combines  the  simple  technique  for  adaptive 
robot  control  proposed  by  Bar-Kana  and  Guez  ana  the 
exploratory  leming  schedule  proposed  by  Selinsky  and  Guez 
into  a  unified  two  degrees  of  freedom  (DOF)  robot 
controller,  thereby  improving  the  performance  of  both 
approaches.  We  first  briefly  summarize  both  approaches. 
We  then  propose  our  new  2  DOF  controller  and  explain  our 
motivation  for  its  design.  Next  we  present  the  main  theorem 
at.d  conclude  with  a  two-joints  rigid  manipulator  example. 

1.  INTRODUCTION 

In  this  paper  we  combine  the  simple  technique  for 
adaptive  rooat  control  proposed  by  Bar-Kana  and  Guez  and 
th ;  exploratory  learning  schedule  proposed  by  Selinsky  and 
Guez  into  a  unified  two  degrees  of  freedom  (DOF)  robot 
controller,  thereby  improving  the  performance  of  both 
approaches.  Both  approaches  are  first  summmarized  in 
Chapter  2.  The  new  2  DOF  controller  is  presented  and  the 
motivation  for  its  design  is  explained  in  Chapter  3.  Chapter  4 
presents  the  main  theorem  and  Chapter  6  conclude  with  a 
two-joints  rigid  manipulator  example. 

2.  THE  ADAPTIVE  CONTROLLERS 

In  [1]  an  unsupervised  distributed  parallel  computing 
architecture  is  proposed  for  the  adaptive  control  of  nonlinear 
dynamic  systems  of  the  class 

x(t)  =  A(x)x(t)  +  B(x)u(t)  (1) 

y(t)  =  C(x)x(t)  +  D(x)u(t)  (2) 

with  some  degree  or  other  of  uncertainty,  where  x(t)e  Rn  is 
the  plant  state  vector,  y(t)eRm  is  the  output  vector,  and 

u(t)eRm  is  the  input  command  vector,  and  where  A(x), 
B(x),  C(x),  and  D(x)  are  uniformly  bounded  matrices  of 
corresponding  dimensions. 

The  proposed  controller  is  depicted  in  Figure  1  below:  It 
consists  of  a  teacher  model  which  incorporates  the 
k  .owledge  regarding  the  desired  input/output  plant 
ri  :ponse  as  well  as  the  repertoire  of  reference  commands 
tl  it  the  system  mav  be  subjected  to. 

The  teacher  dynamic  model  is  assumed  -o  rave  the 
ft  lowing  representation: 

*t(t)  *  At(x^xt(t)  +  B,(x  Jut(t)  (3) 

yt(t)  «  QfxtXW  +  D^xjUtO)  (4) 

v.  tere  x,(t)s Rn*  is  the  state  vector,  y,(t)<s  re  output 

V  ictor,  and  u((t)6  Rm  is  the  input  command  vector,  and 
where  A^),  Bt(x,),  C,(x,),  and  Dt(x,1  are  ur.hormly 
bounded  matrices  of  corresponding  dimensions.  It  is 
emphasized  that  the  dimension  of  the  model  is  unrestricted, 
except  that  dim(y,)  =»  aim(y)  =  m. 

This  paper  is  based  on  research  supported  in  part  by  AFOSR 
Grant  No.  890010  and  by  Drexel  University's  Stein 
Fellowship  Foundation. 
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Fig.  1:  Neurocontroller  Proposed  by  Bar-Kana 
and  Guez  [1] 

The  parallel  distributed  adaptive  controller  receives  the 
input  'features’  vector  f(ut,  xt,  y)  and  generates  as  an  output 
the  process  control  vector  u,  where 

u(t) «  K(t)f(u„  x„  y)  (5) 

where  K(t)  is  the  adaptive  gain  matrix  cf  appropriate 
dimension.  Each  K;;  gain  monitors  the  sensitivity  of  the  i-th 
control  loop,  namely  uj,  to  the  j-th  feature  of  the  system, 
namely  fj(ut,  xt,  y)  and  may  be  viewed  as  the  state  of  the  (/• 
th  neuron. 

The  Kjj  gain  adjusts  its  value  independently  of,  and 
simultaneously  with  all  other  gains,  according  to: 

Kjj(t)  =  Mjj(t)  +  N;j(t)  (6) 

Mjj(t)  ■  cejj(yH  -  y;  )fj  (7) 

|-Nlj(t)  =  -[iijNij(t)+Yij(yt.-yi)fj  (8) 

where  Op,  p;j  and  Yu  arc  positive  constants. 

Bar-Kana  and  Guez  show  that  perfect  regulation  of  the 
plant  may  be  obtained  in  the  purely  deterministic  case. 
Notice  however  that,  while  reducing  the  amount  of  prior 
knowledge  required  to  guarantee  stability  of  the  adaptive 
systems,  this  simple  adaptive  controller  does  not  take  into 
account  any  valid  knowledge  on  the  specific  plant  to  be 
controlled.  No  learning  is  used  to  improve  performance  in 
future  trials  or  to  make  the  control  assignment  easier.  The 
are  nlwavs  zero,  as  no  better  initial 
value  is  avatlaoie,  ana  also  me  range  or  ute  adaptive  gains 
may  me  unnecessarily  large.  Indeed  the  main  purpose  of  this 
simple  algorithm  is  first  of  a!!  to  reduce  the  necessary 
condition  to  mere  stabilizability. 

A  general  block  diagram  of  the  system  proposed  by 
Selinsky  and  Guez  [2]  for  a  robot  manipulator  is  shown  in 
figure  2 .  It  consists  of  the  Exploratory  Schedule  Generator, 
'eurocontroller  and  associated  Learning  Algorithm,  the 
Robot,  and  a  mechanism  which  allows  selecting  between  the 
User  /  Path  Planner  and  ES  Generator  as  the  originator  of  the 
desired  trajectory  (qd). 


Fieure  2:  Block  diagram  of  the  system  proposed 
by  Selinsky  and  Guez  (2J 


The  robot  dynamic  model  is  assumed  as  follows: 

D(q)q  +  C(q,q)q  +  G(q)»t  (9) 

is  an  n  x  n  matrix  of  inertial  terms, 
is  an  n  x  n  matrix  of  coriollis  and 
centrifugal  terms , 
is  an  n  x  1  vector  of  gravitational 
terms, 

is  an  n  x  1  vector  of  joint  coordinates, 
is  an  n  x  1  vector  of  forces/torques, 
is  the  number  of  degrees  of  freedom. 

Equation  (9)  can  also  be  expressed  as 


Notice  in  equation  (13)  the  similarity  to  the  LMS  learning 
rule  (see  NVidrow  and  Steam  ( 4])  where  the  weight  change 

is  proportional  to  the  error  e  and  the  input  features  X.  In 
equation  (13)  the  input  features  are  the  Yij(.)  functions  and 
the  error  is  e^.  • 

Figure  3  shows  the  internal  structure  of  the 
neurocontroller  for  joint  i.  Each  feedforward  network 
module  is  trained  to  provide  one  of  the  Yij(q,q,qr,qr] 
functions.  This  training  can  be  done  offline  since 
the  Ytj[q,q,qr,<ir]  functions  are  known  a  priori  and 
are  the  same  for  all  rigid  robots  of  the  same 
kinematics  and  number  of  degrees  of  freedom.  The 
inputs  to  the  neurocontroller  are  the  components  of  the 
trajectories  as  required.  The  outputs  of  the  neurocontroller 
are  the  control  torques  to  be  applied  at  each  joint  of  the 
manipulator. 


where 

D(q) 

C(q,q) 

G(q) 

q 

X 

n 


X  =  D(q)q  +  C(q,q)q  +  G(q)  =  Y(q,q,q,q)0  (10) 

where  Y(q,q,q,q)  is  an  n  x  p  matrix  of  known  functions 
and  0  is  an  p  x  1  vector  of  weighting  constants  (see  Slotine 
and  Li  ( 3]). 

When  (10)  is  written  in  terms  of  the  individual  torques  at 
each  joint,  it  can  be  viewed  as  a  single  layer  linear  network, 
where  the  inputs  to  the  network  are  the  Yjj(.)  and  the 
weights  are  the  0:  ( See  Figure  3). 

Note  that  the  Yjj(.)  are  transcendental  algebraic  functions 
of  the  manipulators  states  that  are  a  priori  known  and  may  be 
realized  via  feedforward  neural  networks  that  are  trained 
offline  with  a  suitable  learning  algorithm . 

Let  qd,  qd,  qd  designate  the  desired  trajectory. 
Following  Slotine  and  Li  (  3]  define  the  virtual  reference 
trajectory  qr,  qp  and  the  virtual  trajectory  velocity  error  Cr 

qr  =  qd-Ae,  qr  =  qd-Ae,  er  =  q*qf  =  e  +  Ae(ll) 

where  e(t]  =  q[t]-qd[t]  is  the  joint  coordinate  error,  and  A  is  a 
positive  definite  matrix  with  constant  coefficients. 

The  output  of  the  neurocontroller  implements  the  control 
law 


Xj 


£Ylj[q,q,qfcjf]0.  +  Kdiieq 


i  =1.2... ,n  (12) 


Figure  3:  Neurocontroller  structure  for  joint  i. 
The  main  result  in  [2]  is  that  If  p  £  n,  and  if 


Rank 

then  the  controller  (eq.  (12),  (13))  guarantees  global 
asymptotic  tracking  of  the  desired  trajectory  and 
identification  of  the  weights. 

3.  THE  TWO  DEGREE  OF  FREEDOM  ADAPTIVE 
CONTROLLER 

Selinsky  and  Guez  indeed  use  learning  in  the  adaptation 
process.  Their  exploratory  schedules  also  guarantee  that 
along  with  tracking,  the  unknown  plant  parameters  are 
finally  identified.  Yet,  in  order  to  get  satisfactory  results  for 
both  tracking  and  learning,  the  supplementary  gains  Kj-  are 

needed  in  (12).  The  results  of  both  tracking  and  learning  are 
strongly  affected  by  the  right  selection  of  these  parameters 
[3],  and  in  practice  they  are  fixed  by  trial  and  error 
procedures.  Furthermore,  if  one  selects  high  fixed  gains  to 
get  small  tracking  errors,  one  then  also  gets  high  noise 
amplification  and  possibly  high  cost  of  control  even  when 
not  necessarily  needed.  An  adaptive  procedure  for  Kd"  like 

in  Bar-Kana  and  Guez,  will  fix  the  gains  as  a  function  of  the 

.  ..<•  -  . . .  v,  .« . . 


I  lim  t->  °° 


where  (3  denotes  the  estimate  of  0.  and  the  Kd,.  are 
constant  weights  for  the  servo  portion  of  the  controller . 
Equation  (12)  has  been  shown  to  be  asymptotically  stable 
when  the  learning  ruie 

n 

^•LjT-Yijtq.q.M,]^  .  j  =  1,2,..,  p  (13) 
i  a  1  °ii 

is  used  (Slotine  and  Li  [3]) . . . 


tracking  error  (in  our  case  represented  by  ^ )  tends  to 
increase,  and  decrease  afterwards,  fitting  thus  the  right 
control  gains  to  the  right  situation. 

In  this  article  we  propose  the  2  DOF  controller  shown  in 
Fig.  4. 

The  robot  dynamic  model  is  given  by  equation  (9),  the 
identification  and  linearization  neurocontroller  algorithm  is 
given  by  equations  (11),  (12)  and  (13)  and  the  adaptive 
servo  neurocontroller  algorithm  has  the  form  of  equations 
(5),  (6),  (7)  and  (8)  but  for  notation  compatibility  it  is 


redefined  as 

Kd..(t)  =  Kd,..(O^KJpij(t) 

Kd^O-aa&r 
K<J..(t)  =  *PijNij(0+TftjCft_ 


Identification 

and 

Linearization 

Neurocontr.olier 


Main  Result.  It  can  be  shown  [5]  that  global  asymptotic 
tracking  and  identificadon  as  in  [2]  still  holds  even  for  the  2 
DOF  controller,  i.  e.,  when  the  servo  gain  is  time  varying 
according  to  equarion  (14).  That  is,  global  asymptotic 
tracking  as  well  as  global  asymptotic  identification  of  up  to  n 
robot  parameters  can  be  guaranteed  with  our  2  DOF  robot 
controller  if 

,  Km.  [Rank  (Y(q,  q,  qd,  q,))]  a  p.  (17) 

4.  ROBOTIC  EXAMPLE 
We  simulated  the  exact  system  as  in  [2]  with  the  new  2 
DOF  controller  as  follows: 


ROBOT 


wifiHtnCTimawi 


5 

Adaptive  Servo 

Neuroccntroller 

+4 —  : 

(following  [1] 

j 

.  4:  Two  Degrees  of  Freedom  Robot 
Controller. 

Figure  5  -  Double  link  Manipulator 
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Figure  6.  Fixed  Gains:  (a)  A<T>j(t),  A<b2(t),  A<D3(t);  (b)  Ad>4(t),  A4>5(t); 

(c)  Tracking  Errors  e^t).  ej(t);  (d)  Controller  Gains  Kd, ,,  Kd22,  Kdn(t),  Kd22(t). 


Figure  5  describes  the  double  link  system  configuration. 
The  equations  of  motion  are: 

-Dn'4  +  D  ,2%  +  D(q?  +  2q,^)  +  D, 

t2*D11ql  +  Daqi-Dqll  +Dj 
where 

«  Lj  *  1.0  m,  mj  ■  mj  ■  10  kg,  g»9.81  m/s2, 

Du  *  (mt+  m^L  2 +mjLj  +2mjLiLiCosq2, 

+  m1LiL2cosq2,_ 


Djj*  m2L2 , 

D  *  -miLiL^sinqj, 

Dt*(mi+  mjlgLisinqj+mjgl^sinfqi-t^, 

D2  *  nijgLjsin(qt+  qj). 

and  where  g  is  gravity,  tt  and  t2  are  torques  at  the  first  and 
second  joints,  mi  and  m2  are  masses,  and  L|  and  Lj  are 
lengths  of  the  first  and  second  link  correspondingly. 


Figure  7.  FivH  -nine  h)  AO  ■  \  .  .  A03(t);  (b)  A04(t),  /  . 

(c)  Tracking  Errors  e^t),  e2(t);(d)  Trajectories  q^ft),  q2(t); 

(e)  Trajectories  a  .,ft).  q;(t):  (0  Controller  Gains  Krfn,  Kd22j  Kdu(t),  Kd2£<t). 


As  already  mentioned,  the  performance  of  the  adaptive 
ro  otic  control  system  is  strongly  affected  by  the  selection  of 
th  constant  control  gains  K^.  Results  of  simulations 
sv  t  constant  gains  Kdj  j  **  100.,  K<j22  ■  SO.,  are  shown  in 
Fi  ure  6  Figure  6a  shows  the  identification  error  for  the 
fit  t  three  parameters,  figure  6b  shows  the  error  for  the 
re  .ainins  parameters,  and  figure  6c  represents  the  tracking 
er  trs  It  can  be  seen  that  neither  the  parameters  are  very 
w  1  identified,  nor  the  tracking  is  very  good.  In  parallel  with 
th  closed-loop  system  we  compute  (m  open-loop)  and  show 
in  igure  6d,  lor  illustration,  the  constant  gains  Kdu  « 100, 
K  *  50,  and  the  corresponding  adaptive  gains  Kdn(t)  and 
K  ,  (t)  that  would  result  from  these  tracking  errors.  It  can 

bt  seen  that  adaptive  gains  corresponding  to  such  errors 
w  jld  be  very  large.  We  can  expect  that  m  closed-loop,  the 
la  ;er  adaptive  gains  would  actually  reduce  the  tracking 
er  irs. 


Results  of  simulations  with  abetter  selection  of  constant 
g:  .is  Kdu  -  1000.,  Kdn  -  500.,  are  shown  m  Figure  7. 


Figure  7a  shows  the  identification  error  for  the  first  three 
parameters,  and  figure  7b  shows  the  error  for  the  remaining 
parameters.  It  can  be  seen  that  all  parameters  are  eventually 
identified  within  ±  Sm.  Figure  7c  represents  the  tracking 
errors,  and  in  7d  and  7e  we  show  the  desired  and  the  actual 
trajectories,  correspondingly.  It  can  be  seen  that  results  of 
both  identification  and  tracking  are  satisfactory.  Figure  7f 
shows  the  constant  gains  Kdjj  ■  1000,  Kd22  *  500,  and  the 
corresponding  adaptive  gains  Kd}1(t)  and  Kd22(t).  It  is 
interesting  to  see  that  the  adaptive  gains  reach  similar  values 
with  the  "good"  constant  gams,  in  order  to  maintain  small 
tracking  errors. 


Results  of  simulations  with  adaptive  control  gains  in 
closed-loop  are  shown  in  Figure  8.  Figure  8a  shows  the 
identification  error  for  the  first  three  parameters,  and  figure 
8b  shows  the  error  for  the  remaining  parameters.  It  can  be 
seen  again  that  all  parameters  are  eventually  identified  within 
±  5m.  Figure  8c  represents  the  small  tracking  errors,  and 
figure  8d  shows  the  behavior  of  the  adaptive  gains  Kn(t) 
and  K22(t).  It  can  be  seen  how  the  adaptive  gain  moves  up- 
and-down  in  order  to  maintain  small  tracking  errors. 


Figure  8.  Example;  (a)  AOj(t),  AO_2(t),  AOjO);  (b)  Ad>$(t), 

(c)  Tracking  iirroi  s  e^t),  ej(t);  (d)  Adaptive  Controller  Gains  Kjj  j(t),  K^O). . 


5.  CONCLUSION 

This  paper  combines  a  simple  technique  for  adaptive 
robot  control  proposed  and  an  exploratory  learning  schedule 
into  a  unified  two.  degrees  of  freedom  (DOF)  robot 
controller,  thereby  improving  the  performance  of  both 
approaches.  The  power  of  the  new  controller  is  tested  with  a 
two-joints  rigid  manipulator  example. 
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ABSTRACT, 


We  apply  neural  networks  computing  architecture  to  the  solution  of  the 
Inverse  Kinematic  Problem  (IKP),  and  to  the  adaptive  and  learning  on  line 
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We  also  found  that  global  asymptotic  tracking  and  parameters  closed  loop 
learning  are  attainable  with  neurocontrollers. 
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Abstract 

A  nonalgorithmic  method  is  presented  for  the  solution  to 
the  inverse  kinematic  problem  of  a  robot.  The  method  is  robot 
independent  and  involves  a  hybrid  approach,  whereby  a  neural 
solution  is  augumented  with  an  iterative  procedure  which 
provides  the  final  solution  within  some  specified  tolerance. 
Essentially  the  neural  solution  is  similar  to  a  lookup  table  in 
providing  a  good  initial  guess  to  a  classical  iterative  search.  It 
has  been  found  that  for  the  industrial  manipulator  PUMA  560 
the  proposed  hybrid  method  achieves  about  2-fold  increase  in 
computational  efficiency  with  better  uniformity  of  the  time 
required  to  obtain  the  solution  to  the  robotic  manipulator. 


1.  Introduction  and  Background 
The  inverse  kinematic  problem  (KP)  deals  with  finding 
the  'n'  joint  angle  values  *q'  of  the  robot  that  will  position  the 
end-effector  in  a  desired  position  and  orientation  'X'  in  the 'm' 
dimensional  workspace.  This  may  be  expressed  as: 

q  =  f'kX)  (1) 

However,  in  general  this  solution  is  not  unique.  In  many  cases 
(e.g.  redundant  manipulators)  there  may  result  infinite  number 
of  solutions.  In  these  cases,  additional  constraints  [15]  in  terms 
of  the  allowed  configurations  or  performance-function 
minimization  are  used  to  reduce  the  number  of  legitimate  joint 
configurations  or  to  single  out  a  unique  preferable  one  (see 
example  1  below).  Therefore,  where  redundancy  provides  us 
with  more  flexibility  it  also  requires  elaborate  solution 
techniques  which  are  time  consuming  and  generally  impractical. 
The  conventional  methods  consist  of  closed  form  methods  and 
iterative  methods.  These  are  either  limited  to  only  a  class  of 
simple  non-redundant  robots  or  are  time  consuming  and  the 
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solution  may  diverge  due  to  bad  initial  guess  [3]. 

In  a  class  of  neural  networks  (NN)  called  Feedforward 
Networks  the  processing  elements,  termed  as  nodes  indicated  by 
circles  in  Fig.  1,  are  connected  in  layers  through  links,  termed 
as  weights  indicated  by  arrows  in  Fig.  1. 


Fig.  1 :  Feedfoward  Neural  Network. 

The  output  of  the  node  is  a  function  of  the  inputs,  which  are 
weighted  outputs  of  the  nodes  of  the  previous  layer,  and  the 
threshold  of  the  node.  The  learning  takes  place  through  the 
modification  of  the  weights  and  the  thresholds  as  specified  by 
the  training  algorithm  that  acts  on  die  supplied  input  and  output 
data  pairs  as  the  training  set.  The  training  algorithm  used  in  our 
simulations  is  the  Back  Error  Propagation  (BEP)  Algorithm 
[12],  The  nodes  to  which  the  input  is  applied  are  called  as  the 
input  nodes  and  the  nodes  from  which  the  output  is  taken  are 
called  as  the  output  nodes.  The  remaining  nodes  are  termed  as 
hidden  nodes. 

The  solution  to  the  IKP  has  been  attempted  by  different 
approaches  .,y  a  number  of  researchers  including 
[2], [7], [8], [9], [10].  Here  we  address  the  problems  in 
conventional  methods  and  suggest  a  combination  of  the  neural 
network  with  a  conventional  method  to  aid  in  eradicating  them. 
Previous  work  by  us  in  this  direction  employing  neural 
networks  only  [7]  yielded  good  results  but  were  not  accurate 
enough  to  be  practically  utilized.  However,  these  results 
encouraged  us  to  extend  this  approach,  as  presented  here,  which 
is  nonalgorithmic  and,  therefore,  not  specific  to  any  single 
manipulator. 
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Fig.  2  Schematics  of  the  proposed  method  employing  the  NN  to  provide  the  initial  guess  qg  to  the  iterative 
procedure. 


2.  The  Hybrid  Method 

The  formation  of  proposed  method  starts  with  the  robotic 
manipulator  for  which  the  IKP  is  to  be  solved  and  a  ’blank’ 
(untrained)  NN  of  suitable  size.  The  selection  of  the  size  is 
empirical  in  nature.  Then  training  data  in  the  form  of  pairs  of  the 
end-effector  position  and  orientation  and  the  corresponding  joint 
values  is  generated.  This  data  is  used  for  training  the  NN  with 
the  position  and  orientation  vector  as  the  input  and  the 
corresponding  joint  values  as  the  desired  output.  After  the 
training  is  completed  the  trained  NN  is  coupled  with  the  iterative 
method,  as  shown  in  the  Fig.  2(b),  for  the  purpose  of  operation. 
During  the  operation  phase  the  desired  position  and  orientation 
of  the  end-effector  is  provided  to  the  NN.  The  NN  gives  the 
approximate  solution  based  on  the  learned  connection  weights. 
This  approximate  solution  is  taken  as  the  initial  guess  by  the 
iterative  method  to  give  the  final  solution  within  the  specified 
tolerance.  We  consider  that  the  type  of  solution  out  of  the  finitely 
many  solutions  is  pre-specified  to  us  and  therefore  training  of 
the  NN  is  restricted  to  the  set  of  examples  that  pertain  to  this 
specific  solution  only. 


3.  Examples 

The  back  error  propagation  (BEP)  algorithm  simulating  a 
three  layer  perceptron  was  employed  to  tackle  the  problems 
described  below.  Continuous  inputs  and  outputs  were  assumed. 
The  nodes  assumed  the  symmetric  sigmoidal  nonlinearity  [13]. 
Parameters  of  the  NN  are  as  given  in  reference  [1].  Training  was 
terminated  when  it  was  seen  that  the  errors  were  not  improving. 

A  3-DOF  Planar  Robot 

In  a  plane  it  is  clear  that,  for  the  purpose  of  positioning 
only,  two  DOF  are  sufficient.  Therefore,  a  3-DOF  planar 
manipulator,  for  this  purpose,  has  one  redundant  DOF,  see  Fig. 
(3).  This  extra  DOF  introduces  a  problem  of  infinitely  many 
joint  configurations  resulting  in  the  same  end-effector  position. 
This  problem  can  be  resolved  by  adding  an  additional 
independent  constraint  to  be  fulfilled  in  addition  to  the 
positioning  of  the  end-effector. 


The  performance  criterion  optimized  for  the  3-DOF 
planar  manipulator  was  the  "Manipulability  Index”  [15],  defined 
as: 

H  =ViFi  (2) 

where  J  is  the  Jacobian  of  the  manipulator.  Solution  of  this 
manipulator  was  obtained  by  iteratively  solving,  by  Newton- 
Raphson  method,  the  equations  obtained  by  the  procedure 
outilined  in  reference  [4], 

The  network  for  this  case  constituted  of  2  inputs  and  3 
output  nodes  and  two  hidden  layers,  each  containing  10  nodes. 
The  network  was  trained  on  a  data  set  giving  the  X-Y 
coordinates  for  the  end-effector  and  the  three  joint  angles  that 
optimized  the  manipulability  criterion.  Single  inverse  in  the 
training  set  was  ascertained  by  taking  the  initial  condition  as  the 
current  configuration  for  the  arm  solution  to  a  subsequent  nearby 
end-effector  position. 

The  result  of  this  training  is  shown  in  Fig.  4.  This  figure 
shows  the  positioning  error  in  the  cylindrical  coordinate 
workspace.  It  is  seen  that  the  error  begins  to  increase  as  the 
manipulator  moves  nearer  to  the  singularity.  In  this  case 
singularity  occurs  at  02  =  63  =  0  radians. 

When  the  NN  solution  was  coupled  with  the  Newton- 
Raphson  iterative  procedure,  with  a  tolerance  of  10'6  meter  in 
the  end-effector  positioning,  it  resulted  in  nearly  a  2-fold 
increase  in  computational  efficiency.  Table  1  gives  a  comparison 
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Fig.  4:  Result  of  a  NN  trained  to  give  a 
manipulability  index  optimized  solution 
for  a  3-DOF  planar  manipulator.  The 
verticle  axis  corresponds  to  error  in 
positioning  of  the  end-effector. 


Table  1:  Comparison  of  the  proposed  solution 
with  the  Fixed  initial  guess  Newton 
Raphson  Method,  for  a  3-DOF 
manipulator.  The  data  corresponds  to 
100  Samples. 
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Fig.  5:  Iteration  sequences  showing  a  comparison  for  the  convergence  of  the  Newton-Raphson  method  for  the 
initial  guess  given  by  the  neural  network  and  the  fixed  initial  guess. 


between  the  number  of  iterations  required  for  the  iterative 
method  with  fixed  initial  guess  and  the  initial  guess  given  by  the 
NN.  The  fixed  initial  guess  was  taken  as  9i=0°,  02=90°, 
03=5°.  It  was  found  that,  on  the  average,  the  iterative  method 
alone  with  fixed  initial  guess  required  6.0  iterations  to  give  the 
arm  solution,  whereas  the  iterative  method  that  utilized  the 
solution  given  by  the  NN  as  the  initial  guess  required  on  the 
average  2.9  iterations.  Moreover,  the  standard  deviation  of  the 
number  of  iterations  for  the  fixed  initial  guess  is  10  times  that  for 
the  NN  initial  guess.  Fig.  5  shows  the  iteration  sequences  for 
the  fixed  initial  guess  and  the  initial  guess  given  by  the  NN  for 
two  situations.  Fig.  5(a)  is  one  of  the  most  typical  situations. 
From  Fig  5(b)  we  see  that  sometimes  the  number  of  iterations 
may  become  large  for  the  fixed  initial  guess 
The  Human  Aim 

Here  we  show  an  attempt  to  capture  the  criteria  that  a 
human  being  allegedly  optimizes  in-  manipulating  different 
objects  by  training  the  NN  by  a  data  set  corresponding  to  some 
specified  task.  Planar  motion,  parallel  to  the  ground  was  the 


considered  task.  The  subject  was  asked  to  move  an  object  in  free 
space,  in  a  plane  parallel  to  the  ground.  Knowing  the  actual 
distances  between  the  joints  the  data  set  was  filtered  to  achieve  a 
10.0%  tolerance  about  the  respective  actual  values.  The  data  set 
thus  obtained  contained  only  43  words  out  of  a  total  of  78 
words.  A  NN  similar  to  that  for  the  3-DOF  case  was  trained  on 
this  data  set. 

Fig.  6(a)  shows  the  plot  of  the  error  in  the  positioning  of 
the  hand  resulting  for  the  trained  data  set,  while  Fig.  6(b)  shows 
the  same  for  a  different  data  set  obtained  separately  from  the  data 
set  on  which  the  network  was  trained.  The  diameter  of  the 
circles  indicate  the  magnitude  of  error  occured  at  their  center. 
The  two  figures  have  similar  errors  indicating  that  the  neural 
network  has  generalized  on  the  trained  data  set.  Large  errors 
near  X  =  0.5  m  are  perhaps  due  to  the  singularity  reasons  or 
insufficient  data  near  that  region.  Further,  it  was  observed  that 
the  values  for  the  elbow  joint's  were  learned  much  better  than 
those  for  the  wrist  joint  and  the  shoulder  joint 
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(a)  Hand  positioning  error  for  the  training 
data. 


(b)  Hand  positioning  error  for  other  than  the 
training  data. 


Fig.6:  Results  of  single  MFN  trained  on  human  arm  data  in  a  plane  perpendicular  to  the  gravity.  Shoulder  is 
located  at  (0,0). 


As  seen  earlier,  the  filtered  data  set  was  only  55%  of  the 
total  data  gathered  with  10%  tolerance  which  indicates  that  the 
precision  of  the  training  data  may  not  be  adequate.  However,  it 
is  observed  that  the  NN  is  able  to  generalize  upon  the  training 
data  giving  similar  results  for  the  untrained  data  to  that  of  the 
trained  data  implying  that  implicit  performance  indices  can  be 
captured  via  NNs  and  perhaps  identified  via  weight  pruning  and 
analysis. 

The  PUMA  560 

The  PUMA  560  parameters  were  taken  from  [6],  page 
37.  This  manipulator,  which  is  a  6-DOF  robot,  was  chosen  for 
ease  of  generation  of  data  and  verification  of  results  since  it  has  a 
closed  form  solution.  PUMA  560  has  eight  solutions  for  a  given 
position  and  orientation  signified  by  Right/Left  -  Shoulder, 
Above/Below  -  Elbow,  and  Up/Down  -  Wrist.  In  our 
simulations  the  training  data  corresponds  to:  LEFT  Arm, 
BELOW  Elbow  and  UP  Wrist  configuration.  In  the  simulations 
the  joint  limits  used  for  the  6th  joint  were  -180°  to  180°  instead 
of -2660  to  266°. 

The  network  in  this  case  consisted  of  6  input  nodes,  one 
output  node  each  for  the  6  joints  and  two  hidden  layers  for  each 
joint  consisting  of  32  nodes  in  the  first  layer  and  8  nodes  in  the 
second  layer.  The  average  error  and  the  standard  deviation  of  the 
error  in  the  solution  given  by  the  NN  for  each  joint  taken  over 
100  samples  is  given  in  table  2.  From  table  2  it  is  seen  that  the 
solution  given  by  NNs  is  more  scattered  for  Joints  4  to  6  as 
compared  with  the  joints  1  to  3. 

Next,  the  proposed  method  was  compared  by  giving  a 
fixed  estimate  to  the  iterative  procedure.  This  Fixed  Estimate 


was  taken  as:  9)  =  0,  02  =  0,  63  =  tt/4,  64  =  0, 65  =  tt/4,  and 
06  =  0,  which  is  a  configuration  corresponding  to  which  the  NN 
was  trained,  as  indicated  in  the  beginning  of  this  section.  In  the 
simulations  the  equations  were  solved  by  Gauss  Elimination 
method  and  partial  pivoting.  The  maximum  number  of  iterations 
allowed  for  the  iterative  method  were  100.  The  iterative  method 
was  successfully  terminated  when  the  norm  of  the  difference 
between  the  desired  and  actual  end-effector  position  and 
orientation  was  less  than  1.0E-4.  The  average  and  standard 
deviation  for  the  number  of  iterations  for  the  proposed  method 
and  the  Fixed  Estimator,  in  a  run  of  100  data  points  is  given  in 
Table  3.  From  this  table  we  can  see  that  the  proposed  method 
achieves  more  than  a  two-fold  efficiency  in  computing  with 
more  consistency.  Moreover,  it  was  observed  that  the  time  taken 
by  the  NN  equals  two  time  units  of  the  iterative  procedure, 
which  amounts  to  less  than  10%  of  the  time  required  to  get  the 
solution  by  the  Fixed  Estimator  method. 

Fig.  7  shows  the  comparison  plots  for  two  cases 
indicating  the  sequence  of  Newton  Raphson  iterations  for  the 
fixed  initial  guess  and  the  guess  given  by  the  NN.  Here  the  error 
along  the  ordinate  axis  corresponds  to  the  norm  of  the  error, 
defined  as: 

Error  =  V  zl+e2y+ejrcl+e&e2v  (3) 

where  e  is  the  difference  between  the  desired  and  the  actual 
values  and  the  subscripts  indicate  the  variable  for  which  this 
difference  is  taken.  The  allowed  end-effector  position  and 
orientation  error  in  this  case  was  10'7.  These  are  typical  cases 
and  we  can  see  form  these  that  the  number  of  iterations  required 
for  the  proposed  method  has  been  decreased  by  two-fold. 
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Table  1'  Statistics  of  the  solution 
given  by  the  NN. 


Joint 

# 

Error  in  De  free* 

Average 

Suuidard 
Dev umon 

t 

4.06 

13.428 

2 

•0.16 

30.492 

3 

5.64 

11.52 

4 

3.66 

61.236 

5 

-3.70 

24.912 

6 

■6.02 

36.828 

Table  3:  Comparison  of  the  proposed  solution 
with  the  Fixed  initial  guess  Newton 
Raphson  Method,  for  PUMA-560 
manipulator.  The  data  corresponds  to 
_ 100  Samples. _ 


Initial  guess  for 

Newton-Raphson 

Method 

Number  of  Iterations 

Average 

Standard 

Deviation 

Neural  solution 

9.96 

12.95 

Fixed 

21.09 

18.90 

(a)  (b) 

Fig.  7  :  Sequence  of  Newton  Raphson  iterations  for  the  guess  given  by  the  NN  and  the  fixed  initial  guess. 


4.  Discussion 

The  proposed  hybrid  method,  which  takes  the  solution 
given  by  the  trained  NN  as  an  initial  guess  to  an  iterative 
procedure  (Newton-Raphson  in  our  case)  combines  the 
advantages  of  the  NN  and  iterative  methods,  these  being  (1) 
independent  of  the  type  of  the  manipulator,  (2)  simple  to 
implement.  Only  forward  kinematics  is  required  for  this  method 
and,  as  shown  by  our  simulations  this  combination  results  in  an 
increase  in  computational  efficiency  by  2-fold  for  a  3-DOF 
planar  manipulator  and  the  PUMA  560  (6-DOF)  robot.  It  is 
observed  that  the  efficiency  increases  with  the  increase  in  the 
size  (the  number  of  nodes  in  the  hidden  layers)  of  the  network 
for  that  manipulator.  Therefore,  a  trade-off  between  the  size  of 
the  network  /  the  training  time  and  the  accuracy  of  the  solution 
given  by  the  NN  exists  and  can  be  employed  in  order  to  increase 
the  efficiency  of  the  method  or  decrease  the  training  time.  Where 
otherwise  the  number  of  iterations  may  be  unacceptable  for  real 
time  operation,  an  initial  good  guess  given  by  the  neural  network 
consistently  cuts  the  number  of  iterations  to  only  a  very  few 
allowing  a  better  operation  of  the  manipulator.  This  results  in 
minimal  processing  within  each  control  cycle  and  improves  real 
time  control  performance.  Basically  the  NN  operates  as  a  lookup 
table  in  providing  a  good  initial  guess  to  an  iterative  procedure. 


However,  when  the  NN  is  implemented  in  hardware,  after  the 
initial  off-line  training,  the  adaption  of  the  weights  and  therefore 
the  learning  of  the  inverse  kinematics  can  be  continued  on-line. 
This  will  continue  to  decrease  the  error  of  the  solution  given  by 
the  NN.  Therefore,  the  number  of  iterations  required  by  the 
iterative  method  would  approach  zero. 
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ADAPTIVE  POLE  PLACEMENT  FOR  NEUROCONTROL 
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Abstract :  In  this  papei  we  deal  with  the  study  of  a  self  organizing  neural  network  architecture 
called  the  Adaptive  Resonance  Theory  (ART-II)  [Grossberg  and  Carpenter  1987(a)],  in  its 
application  to  a  simple  problem  of  adaptive  control,  through  real  time  dynamic  system 
identification.  An  adaptive  pole  placement  controller  for  a  linear  second  order  system  is 
implemented  based  upon  this  architecture  to  assess  the  performance  of  the  network  and  the  overall 
control  scheme  with  the  neural  network  in  the  control  loop.  The  network  employed  demonstrates 
the  capabilities  of  fast  classification  and  learning.  These  attributes  of  the  architecture  are  exploited 
to  train  a  network  to  dynamically  identify  parametric  variations  of  a  plant  in  response  to  changes  in 
its  environment.  The  control  strategy  is  based  upon  identification  of  changes  in  the  time  response 
characteristics  of  the  system  to  standard  test  signals  which  are  assessed  by  the  neural  network.  A 
pole  placement  algorithm  is  utilized  to  relocate  the  poles  of  the  overall  closed  loop  system  by 
altering  the  gains  of  the  process  controller  to  obtain  desired  system  response.  Experimental  studies 
on  a  simulated  second  order  system,  employing  a  Proportional  Derivative  controller  show  that  the 
neural  network  considered  is  indeed  capable  of  system  identification  and  simple  indirect  adaptive 
control  of  low  order  plants  that  are  subjected  to  parametric  variations  reflected  by  changes  in  their 
operating  environments. 

Problem  Statement  :  Our  goal  is  to  implement  an  adaptive  pole  placement. algorithm  using  a 
neural  network  architecture.  Let  the  possibly  slow  time  varying  linear  dynamical  system  (Plant) 


Figure  1  Process  block  diagram 


Plant  transfer  function :  Gd  (s)  =  r 

s +gs  +  p  (1) 

Let  the  reference  input  be  a  known  periodic  function 

Reference  inp'it:  r  (t)  =  r  (t  T) ,  for  all  t  >  0  (2) 

Find  online,  the  Proportional  and  Derivative  (PD)  gains,  (Kp,  Kd)  and  the  D.C.  bias  (C) 
(sc  2  Figure  1),  such  that  the  control  law  give  by  equation  (3), 

Control  law :  u  =  C  r  -  Kpy  -  Kds  y  =  C  r  -  Gcy  (3) 

will  result  in  the  ideal  closed  loop  transfer  function 


Y  (s)  » !  w  n 

Go(S)  =  r  =  K  | - ^ 5  =  Ge 


where,  K*.  are  respectively  the  desired  D.C.  gain,  damping  coefficient  and  the  natural 
frequency  of  the  system. 


Neuromorphic  approach  to  the  controller  design  :  The  general  block  diagram  of  the 
overall  control  scheme  is  depicted  in  figure  3.  The  common  a  priori  assumption  in  the  design  of 
controllers  forpartially  known  processes  is.adopted  with  the  design  procedure  being  divided  into 
two  steps:  identification  and  control  (indirect  adaptive  control  strategy)  [Astrom  & 
Eyklioff,1971]. 


Figure  2  Block  diagram  of  the  neuromorphic  adaptive  control  scheme 


Objective  :  Given  a  second  order  plant  with  unknown  constant  or  time  varying  parameters,  the 
objective  of  the  control  scheme  is  to  tune  the  controller  by  modifying  its  arbitrarily  assigned  initial 
gains  such  that  the  overall  closed-loop  system  response  matches  the  one  given  by  the  assumed 
ideal  model,  based  upon  parameter  identification  provided  by  the  neural  network  in  the  control 
loop.  It  shoukfbe  noted  that  although  the  plant  and  the  controller  are  linear  the  process  is  overall 
nonlinear. 


Methodology  :  When  the  input  is  a  unit  step  function,  identification  of  the  plant  parameters  is 
achieved  by  extracting  the  features  of  the  system's  closed-loop  transient  time  response  via  a  feature 
extractor.  In  the  case  of  the  square  function  and  the  square  wave  function,  identification  is 
restricted  to  the  step  portion  of  the  input  signal.  The  neural  network  in  the  control  loop  which  is 
trained  to  map  the  features  of  a  system's  time  response  to  its  parameters,  gives  the  current 
parameter  estimates  of  the  closed  loop  process.  The  pole  placement  algorithm  in  the  loop  utilizes 
the  estimates  provided  by  the  network  to  compute  new  controller  gains  to  suit  the  desired  response 
specified  by  the  ideal  model.  A  D.C.  adjustment-mechanism  is  incorporated  to  compensate  for  any 
D.C.  bias  that  might  be  associated  with  the  response.  The  system  identifier  comprises  of  the 
feature  extractor,  the  neural  network  and  the  pole  placement  algorithm.  The  purpose  of  fhe 
identifier  is  to  determine  the  plant  parameters  in  response  to  changes  withinnhe  plant  enviror  ,t. 
The  time  response  of  the  system  is  characsenizedby  its  features  ox  performance  indices  w'  are 
nonlinear  functions  of  the  system  parameters.These  include  the  rise  time,  the  settl>  .ime, 
overshoot  and  the  delay  time  of  the  response.  A  feature  extractor  incorporated  in  the  co.  ml  loop 
determines the  performance  indices  associated  with  the  response  to  enable  system  identification  via 
the  neural  network.  It  must  be  noted  that  identification  is  dependent  on  thefeatures  of  the  time 
response  rather  than  the  response  per  se.  This  is  done  in  order  to  compress  the  information 
contained  in  the  response  such  that  the  input  vector  to  the  neural  network  remains  compact  [Kumar 
&  Guez,1989],  This  procedure  restricts  the  dimension  of  the  neural  networkno  a  minimum  thereby 
increasing  its  computational  speed  and  overall1  efficiency  of  the  process.  The  overall  transfer 
function  is  obtained  by  using  equations  (1),(2)  and  (3)  as  follows: 

Y  (S)  *  Gp  (S)  U  (S)  s  Gp  (S)  [  C  R  (S)  •  Gc  (S)  Y  (S)]  (5) 

n  (s\  a  Y  (s) _ C  Gp(s)  _  ,  CK-, _  (6) 

0  R  (s)  1  +  Gp(s)  Ge(s)  “s:+  («  + KJ<<)S  +  -( P-+ K„JCP) 


(7) 


multiplying  and  dividing  the  numerator  by  ( p  +  Km  Kp), 
r  CKm  (P  +  W 

(P  +  KJCp)  S2+(a+KJCd)S+  (P  +  KJC.) 


In  order  to  obtain  the  ideal  transfer  function,  we  set  Gq(S)  =  G0(S)  which  implies: 

CKm  (P  +  W  ^  K.  <oj 

(.P  +  K-nJCp)  [s2+(a  +  K^d)S+  (P  +  KJCp)]  ls2+  2  C*w*ns  +  “‘nl  (8) 


the  new  controller  gains  obtained  from  the  above  equations  are  then  given  by: 
„•  «n--P  „•  2C‘<o*n  -a  .  MP  +  K^'p)  K*wf 

Kp=  Kra  . Ki=  rm  C=  £  kT 


Training:  The  network  employed  is  used  in  the  supervised  learning  mode  and  is  trained  off  line 
before  its  inclusion  into  the  control  loop.The  training  data  module  comprises  of  a  data  generator 
whose  inputs  form  the  approximate  ranges  of  the  system  parameters  which,  in  the  present 
application,  are  the  natural  frequency  and  the  damping  ratio  of  the  generalized  second  order 
system.  A  typical  range  of  the  natural  frequency  is  estimated  from  the  knowledge  the  plant  time 
constant  used  in  the  process.The  ranges  selected  thereof  cover  approximately  the  entire  gamut  of 
the  systems  time  response  to  maximum  deviation  in  the  plant  parameters.  A  system  emulator  that 
consists  of  the  process  model  generates  the  actual  system  time  response  to  the  various  input  signals 
using  different  settings  of  the  parameters  within  the  parameter  ranges  specified.  The  parameters  are 
incriminated  in  discrete  steps  according  to  a  prescribed  resolution  that  is  dependent  upon  the  trade 
off  between  accuracy  of  identification  desired  and  the  size  of  the  network.  The  time  responses  of 
the  assumed  system  model  are  then  passed  on  to  the  feature  extractor  that  determines  the  various 
features  or  performance  indices  of  the  response.  The  performance  indices  along  with  the  respective 
system  parameters  form  the  training  data  set  for  the  neural  network.  Once  the  training  data  is 
available,  the  network  is  configured  such  that  the  number  of  nodes  in  the  feature  representation 
field  corresponds  to  the  dimension  of  the  input  feature  vector, while  the  number  of  processing 
nodes  in  the  category  representation  field  are  generally  set  greater  than  total  number  of  input 
patterns  in  the  prototype  set.  Each  pattern  in  the  prototype  set  is  sequentially  presented  to  the 
network  once.  A  second  cyclic  presentation  of  the  prototype  set  may  be  made  for  a  stable  category 
confirmation.  During  training,  the  attentional  vigilance  parameter  is  set  at  its  highest  value  (0.999) 
to  ensure  a  high  resolution  of  the  resulting  category  structure.  The  category  structure  represents  the 
state  space  partitioning  of  the  neural  network  depending  on  the  number  of  stable  categories 
establishedduring  training.When  the  network  is  presented  with  a  feature  vector  for  the  first  time,  it 
is  encoded  in  the  LTM  through  modification  of  the  LTM  connection  weights.  The  parameters 
associated  with  the  feature  vector  now  get  assigned  to  this  allocated  node  in  the  category 
representation  field.  On  presentation  of  the  subsequent  feature  vectors,  the  network’s  orienting 
subsystem  first  determines  closeness  of  match  between  the  pattern  currently  imposed  on  the 
network  and  any  of  the  patterns  that  have  previously  been  seen.  Since  the  vigilance  parameter  is 
sefso  high  a  new  node  is  allocated  for  the  pattern.  However,  if  the  current  pattern  happens  to  be 
exactly  similar  to  the  one  the  network  has  seen  before  it  is  clustered  into  the  same  category.  It  is 
therefore  possible  to  partition  the  networks  state  space  such  that  each  partition  selves  as  an  attractor 
for  a  particular  type  of  response  characterized  by  its  feature  vector.  After  completion  of  training  the 
top  down  and  the  bottom  up  connection  weights  of  the  network  along  with  the  network  parameters 
are  saved.  To  make  the  network  uniformly  sensitive  to  all  components  of  the  feature  vector,  the 
feature  vector  components  were  enhanced  by  passing  the  feature  vector  through  a  nonlinear 
function  given  by: 

fj(x)  =  e?x  Where y  =  0.05 


Testing:  When  the  network  is  introduced  in  the  control  loop,  identification  proceeds  almost 
instantly.  Search  for  the  category  associated  with  the  right  parameters  is  achieved  by  dynamically 
altering*  the  attentional  vigilance  parameter  until  an  "optimal  vigilance"  is  found,  [Kumar  &  Guez, 
1989]. 


Figure  3  Block  diagram  of  network  training  module 

Experimental  Results  :  The  time  response  of  the  system  is  obtained  using  a  Runge-Kutta 
fourth  order  differential  equation  solver  (RK-4).Each  of  the  following  figures  depict  two  plots,  (a) 

The  response  of  the  actual  system  to  the  unit  step  input  signal  with  arbitrary  initial  parameters  a,  P, 
and  Km,  and  (b)  the  final  system  response  with  the  D.C.  bias  or  the  steady-state  error 
compensated  through  the  parameter  C.  The  neural  network  employed  is  trained  off-line  on  the 
features  of  the  response  obtained  using  the  generalized  second  order  equation  with  the  following 
parameter  variations: 

0.1  1.50,  AC  *0.10 

0.5 2.50,  Ag)b!So.50  (9) 

These  parameters  are  not  made  available  to  the  system  identifier  but  are  estimated  through  the 
neural  network  in  the  control  loop,  based  upon  the  features  of  the  system  time  response  to  the 
standard  test  signals.  The  only  available  information  to  the  feature  extractor  module  which  precedes 
the  neural  network  in  the  overall  process  block  diagram  is  the  time  and  the  value  of  the  system 

response  at  that  time  instant.  Specification  of  the  parameters  cto  andipo  enable  in  the  computation 
of  the  desired  damping  and  the  desired  natural  frequency  of  the  system  response  based  upon  the 
desired  location  of  the  system  closed  loop  poles  within  the  left  half  of  the  S-  plane.  In  the  case  of 
the  square  function  and  the  square  wave,  it  should  be  noted  that  feature  extraction  was  restricted  to 
the  step  portion  of  the  response  (12  seconds).  Figure  (a)  and  (o)  represent  the  nominal  responses 
of  the  assumed  ideal  model  to  the  unit  step  and  the  square  function.  Figures  (c)  and  (d)  depict  the 
time  variation  of  the  plant  parameters  during  the  identification  process.  Figures  (e)  and  (f)  show  the 
ability  of  the  proposed  neuroccntroiler  to  modify  an  overdamped  and  an  underdamped  system 
response  respectively,  with  the  input  being  a  unit  step  function.  It  should  be  noted  that  the  original 
system  response  indicated  refers  to  arbitrarily  assigned  initial  plant  parameters.  The  final  controller 
parameters  are  those  obtained  after  the  plant  parameters  are  estimated  by  the  neural  network  system 
identifier  and  the  relocation  of  the  overall  closed  loop  system  poles  by  the  pole  placement  module. 
Figures  (g)  and  (h)  refer  to  the  square  input  function.  In  Figures  (i)  and  (j)  are  shown  the  output 
responses  of  the  plant  when  the  square  wave  is  imposed  at  the  input  The  time  ,'cale  incorporated  in 
the  simulation  is  adaptive  for  the  unit  step  input  and  depends  upon  the  duration  of  the  system 
transient  rcsponse.The  sampling  frequency  selected  is  common  toibothithe  response  sampling  rate 
and  the  RK-4  differential  equation  solver.  It  Is  well  above  the  Nyquist  frequency  of  the  system. 


Conclusion  :  A  new  approach  for  dynamic  system  identification  has  been  attempted  that  involves 
the  application  of  a;neural  network  architecture  based  on  the  Adaptive  Resonance  Theory.  It  has 
beemshown  that  it  is  possible  to  train  the  ART  neural  net  offline  (supervised  learning  mode)  to 
identify  the  parameters  of  a  system  based  upon  attributes  of  its  response  to  standard  test  signals.  A 
simple  indirect  adaptive  control  scheme  was  formulated,  implemented  and  tested  to  assess  the 
performance  of  the  network  that  was  incorporated  as  the  online  system  identifier  in  the  control 
loop.  Experimental  results  suggest  that  identification  provided  by  the  network  is  accurate  within  the 
resolution  of  the  training  data,  and  its  is  possible  to  control  a  low  order  plant  whose  parameters  are 
either  unknown  or  are  time  varying.  The  off  line  training  of  the  network  was  found  to  be  fast  with 
the  training  experimental  data  set  being  presented  to  the  network  only  once. 
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A  Neiirocontroller  with  Guaranteed  Performance  for  Rigid  Robots 
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Drexel  University,  ECEDepartment 
32nd  and  Chesmut  St.,  Philadelphia,  PA  19104 


Summary 

In  this  article,  we  propose  and  evaluate  a  neural  network  based  adaptive  controller  for  rigid 
robots.  The  proposed  neurocontroller  incorporates  a  priori  knowledge  of  the  robot's  dynamic 
structure  so  as  to  provide  proven  trajectory  tracking  and  parameter  identification.  A  theorem 
regarding  constructive  sufficient  conditions  for  asymptotically  stable  closed  loop  learning  is  stated 
and  used  in  the  design  of  Exploratory  Schedules  (ES).  ES  are  reference  trajectories  which  are 
specifically  designed  to  provide  efficient  learning.  In  neurocomputing,  ES  are  the  training 
samples/examples  that  are  used  to  modify  the  network  architecture  during  learning.  We  specify  the 
ES  as  a  desired  trajectory  that  is  to  be  followed  to  do  learning  while  the  manipulator  is  not  doing 
other  useful  tasks. 

Simulation  results  of  a  2  degrees  of  freedom  (DOF)  manipulator  are  given. 

Rigid  Robot  Dynamics 

The  Lagrange-Euler  formulation  of  rigid  robot  dynamics  [1],  has  the  form 

x  =  Y[q,q,q,q]0  (1) 

where  Y[q,q,q,q]  is  an  n  x  p  time  varying  matrix  of  known  and  generally  nonlinear  functions,  0  is 

a  p  x  1  vector  of  constant  parameters,  x  is  an  n  x  1  vector  of  joint  torques,  q  is  an  n  x  1  vector  of 
joint  coordinates,  and  n  is  the  number  of  degrees  of  freedom. 


Neurocontroller  Design 

When  (1)  is  written  in  terms  of  the  individual  torques  at  each  joint,  it  can  be  viewed  as  asingle 
layer  linear  network,  where  the  inputs  to  the  networkare  the  Yy[.]  and  the  weights  are  the  0j 
[2], [3].  The  Yjj[.]iare  transcendentaUalgebraic  functions  of  the  manipulators  states  and  may  be 
realized  via  feedforward  neural  reworks  that  are  trained  with  a  suitable  learning  algorithm. 

Let  q^,  q^,  q^  designate  the  :  cm  red  trajectory.  Following  [4]  define  the  virtual  reference 
trajectory  qr,  qr,  and  the  virtual  tra-ectory  velocity  error  tj. 


q^qd-Ae,  q^c^-Ae,  e:=q-qr*c-Ac  (2) 

where  e[t]  =  q[t]-qj[t]  is  the  join  coordinate  error,  and  A  is  a  positive  definite  matrix  with  constant 
coefficients. 

The  output  of  the  neurocomrolier  implements  the  control  law 


P 

V  I  Yij[qv4q1.qrJ^i " 

j  =  i 

where  0  denotes  the  estimate  of  O 
the  learning  rule 


i  =  1,2, ... ,  n  , 


(3) 


Equation  1 5 1  hi'  •  -en  shown  to  be  asymptotically  stable  when 


j=l,2 


(4) 


-Ij^;Yij[q,q,qi,'ir]®ri. 


i  =  1  au 


.p  • 


is  used  [4]. 

Notice  in  equation  (4)  the  similarity  to  the  LMS  learning  rule  (see  [5])  where  the  weight  change 
is  proportional  to  the  error  e  and  the  input  features  X.  In  equation  (4)  the  input  features  are  the 
Yjj[.]  functions  and  the  error  is  e^  . 

Figure  1  shows  the  internal  structure  of  the  proposed  neurocontroller  for  joint  i.  Each 
feedforward  network  module  is  trained  to  provide  one  of  the  Yij[q,q,qr,qr]  functions.  This 
training  can  be  done  offline  since  the  Yjjlq^q^q,.]  functions  are  known  a  priori 
and  are  the  same  for  all  rigid  robots  of  the  same  kinematics  and  number  of 
degrees  of  freedom.  The  inputs  to  the  neurocontroller  are  the  components  of  the  trajectories  as 
required.  The  outputs  of  the  neurocontroller  are  the  control  torques  to  be  applied  at  each  joint  of 
the  manipulator. 


Closed  LQQD-Learn in g  and  Tracking  Via  Exploratory  Schedules 
Theorem  1  [6]:  If  p  <,  n,  and  if 


Rank 


lim  t->  « 


(5) 


then  the  controller  specified  by  equations  (3)  and  (4)  guarantees  global  asymptotic  tracking  and 
parameter  identification  i.e.  q(t)->  qd(t),  and  0(t)  ->  0(t). 

Lemma  1  [7]:  0  ->  0  V  0  e  Rk  ,  k  <  n,  such  that  0  is  not  contained  in 


N| 


YE-Vd.qd.qJ 

lim  t->  °° 


(6) 


where  N(.)  denotes  the  null  space  of  the  matrix  (.),  and  denotes  the  parameter  estimation  error 
vector.  The  implication  of  lemma  1  is  that  if,  k<n  parameters  are  not  in  the  null  space  of  equation 
(6)  then  the  components  of  the  parameter  estimation  error  vector  corresponding  to  the  k  parameters 
are  zero. 


Design  of  Exploratory  Schedules 

Lemma  1  implies  desired  trajectories  (which  specify  q^.q^.q^  )  can  be  designed  such  that  k<n 
specific  columns  of  Y[  qrfqd,qj.qj  will  be  linearly  independent  in  the  limit.  The  exploratory 
schedule  then  consists  of  a  sequence  of  desired  trajectories  which  are  designed  to  learn  different 
components  of  the  parameter  estimation  vector  0,  where  the  number  of  desired  trajectories  is  such 
that  all  p  components  of  0  are  identified. 

Simulation  Results:  2  DOF  Manipulator 

The  simulation  results  reported  were  obtained  using  exactly  computed  values  of  the  Yjj(.] 
functions,  rather  than  the  three-layer  neural  network  form.  This  simulation  provides  experimental 
verification  of  lemma  1.  The  dimension  of  Y(q,  q,  qr,qr]  in  the  2  DOF  case  is  2  x  5  so  that  p>n, 


and  at  most  2  parameters  can  be  guaranteed  to  be  learned  simultaneously. 

The  Exploratory  Schedule  for  the  2  DOF  manipulator  (see  Figure  2)  consists  of  a  sequence  of 
three  desired  trajectories  so  chosen  as  to  guarantee  learning  of  different  parameters.  In  trajectory  1, 
04  and  05  are  learned.  The  time  period  corresponding  to  trajectory  1  is  0  <  t  <  4.2.  In  trajectory 
2,  03  is  learned.  The  time  period  corresponding  to  trajectory  2  is  4.2  <  t  <  7.3.  In  trajectory  3, 
01  and  02  are  learned.  Trajectory  3  corresponds  to  the  time  period  7.3  <  t  <  10.3  . 

The  time  period  corresponding  to  identification  of  04  and  05  is  0  <  t  <  4.2.  As  can  be  seen  in 
figures  3  to  6,  the  parameter  estimation  error  for  04  and  05  approaches  zero  as  all  joint  errors 
approach  zero  at  t=  4.2.  The  time  period  corresponding  to  identification  of  03  is  4.2  <  t  <  7.3. 
The  parameter  estimation  error  for  03  approaches  zero  as  all  joint  errors  approach  zero  at  t=  7.3. 
Parameters  0i  and  02  are  identified  during  the  time  period  7.3  <  t  £  10.3  . 


figure  3:  Joint  position  error  during  ES.  figure  4:  Joint  velocity  error  during  ES. 


tim*  (s*c) 


figure  b:  Estimation  error  for  04  and  0 5. 
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Abstract 

Distributed  detection  and  estimation,  and  sensor 
fusion,  are  at  present  topics  of  intensive  research  due  to 
the  variety  of  systems  which  employ  different  sensing 
devices  at  geographically  separated  sites  for  tasks  such 
as-  target  detection  and  tracking,  diversity  in 
communications  and  threat  classification.  In  this  study, 
we  demonstrate  a  simple  distributed-detection  scheme 
whose  probability  of  error  can  be  calculated  analytically, 
and  show  that  it  corresponds  to  a  two-layer  network  of 
binary  threshold  elements.  We  proceed  to  assume  that 
the  tensors  and  the  fusion  center  are  subject  to  sudden 
unpredictable  changes  in  the  environment  that  they 
survey,  and  show  how  learning  algorithms  can  be  used 
in  order  to  maintain  good  performance,  in  spite  of  these 
change*.  We  conclude  with  an  example,  involving  five 
unequal  seniors  which  distinguish  between  two 
time-varying  Gaussian  populations  of  different  means. 


The  term  sensor  fusion  is  used  to  describe  the 
mechanism  for  combining  data  from  distributed  sensors. 
The  fusion  is  usually  performed  for  purposes  of 
detection  or  estimation,  with  applications  such  as:  target 
detection  and  estimation  in  radar  systems,  diversity  in 


local  decision  as  to  the  existence  (U|-+l)  or 
non-existence  (  U|»  -1)  of  a  target  within  its  volume  of 
coverage.  This  decision  can  be  arrived  at  on  the  basis  of 
a  single,  multiple  or  sequential  observations  at  the  local 
detector’s  site.  Typically,  a  likelihood  ratio  test  will  be 
performed,  with  a  decision  threshold  determined  by  the 
objective  function  (usually  a  Bayesian  risk  or  a 
Neyman-Pearson  criterion.)  The  calculation  of  the 
threshold  typically  involves  the  solution  of  a  set  of 
coupled  nonlinear  algebraic  equations,  a  task  which  can 
become  computationally  un tractable  even  for  a  moderate 
number  of  sensors  (e.g.  Tsitsiklis  and  Athins  (1985), 
Reibman  and  Nolle  (1988).) 

The  decisions  of  tnc  local  detectors  are 
transmitted  to  a  global  detector  (the  Data  Fusion  Center, 
DFC)  over  communication  channels  that  are  practically 
noiseless.  The  DFC  is  to  make  a  global  decision  about 
the  existence  of  the  target  (u-1:  decide  Hj,  or  u— 1: 
decide  H^. ) 

Figure  1  depicts  two  sensor  fusion  systems 
which  differ  in  terms  of  the  capability  to  transmit  local 
decisions  to  the  DFC.  The  first  case  (figure  la)  is 
applicable  when  all  sensors  are  located  at  the  same  site, 

or  when  the  sensors  are  distributed,  and  the  capacity  of 
the  channels  between  the  sensors  and  the  DFC  allows 
practically  error-free  transmittal  of  the  sensor 
observations  (q)  to  the  central  processing  site.  The  DFC 


digital  communications  systems  and  crime 
countermeasures.  For  relevant  references  see 
Bar-Shalom  and  Fortmann  (1988),  Thomcpoulos  et  al. 
(1987)  and  Riebman  and  Noltc  (1987).  In  the  present 
paper  we  concentrate  on  the  distributed  parallel  detection 
problem  with  binary  local  decisions,  depicted  in  figure 
lb.  We  consider  a  system  which  observes  a 
phenomenon  H,  through  a  set  of  sensors  which  may 
have  different  operating  principles,  and  may  be  located  at 
different  sites.  An  example  of  the  former  case  is  a 
combination  of  radar  active  sensors,  operating  at 
different  frequencies  (e.g.  infrared  and  RF  sensors):  an 
example  of  the  latter  is  an  array  of  receiving  antennae, 
located  at  different  physical  sites  -  for  spatial  diversity. 
E*ch  OTe  0f  ,j,c  font-end  sensors,  also  called  local 
detectors,  observes  the  phenomenon  H,  and  makes  a 

Research  supported  in  part  by  NSF  grant  HU  8810186 


obtains  a  vector  of  real  observations,  and  its  decision  is 
based  on  classical  multiple-observation 
hypothesis-testing  theory  (e.g.  Sage  and  Melsa  (1971).) 

When  the  sensors  arc  distributed,  and  capacity 
constraints  do  not  allow  the  transmittal  of  the  observation 
(or  when  the  sensors  are  at  the  same  site  -  but  central 
processing  capabilities  are  limited,)  there  is  a  need  to 
make  partial  decisions  at  the  sensors’  sites,  and  transmit 
those  nartial  decisions  to  the  DFC.  The  extreme  case 
(fully  decentralized  decision-making)  is  depicted  in 
Figure  lb.  Here  each  sensor  makes  a  local  decision: 
u,=l  if  a  target  is  detected,  and  u;=«  -1  otherwise.  Only 
this  binary  decision  is  available  at  the  DFC  as  the 
contribution  of  the  i^  sensor  to  the  decision  process.  The 
DFC  has  to  map  the  binary  vector  of  local  decisions  into 
the  global  decision. 
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Figure  1:  Multiple-observation  hypothesis-testing 

a)  centralized  system 

b)  fully-decentralized  system 


The  basic  questions  that  arise  from  the  described 
situation  are: 

1)  What  are  the  (optimal  and  sub-optimal)  configurations 
that  could  be  employed  for  the  local  sensors  and  for  the 
DFC  when  the  system  is  fully  informed :  By  fully 
informed  we  mean  that  the  statistics  of  the  environment 
are  known  to  the  local  detectors  and  the  DFC;  and  the 
reliability  of  each  local  decision  is  known  to  the  DFC  in 
terms  of  local  probability  of  missed  detection  and  the 
probability  of  false  alarm. 

2)  What  is  the  performance  that  the  fully-informed 
decision  maker  achieves? 

3)  How  can  learning  techniques  be  incorporated  in  the 
decision-making  schemes,  such  that  unknown  or 
time-varying  properties  of  the  environment  could  be 
extracted  from  available  data,  and  used  for  adaptive 
decision  making? 

The  present  paper  contributes  to  the  study  of  the 
second  and  third  problems.  We  are  interested  in 
exploiting  the  ability  of  neural  networks  to  learn,  in  order 
to  maintain  performance  of  the  'ignorant'  distributed 
configuration,  which  is  close  to  that  of  the 


fully-informed  system:  such  performance  is  to  be 
maintained,  in  spite  of  unpredictable  changes  in  the 
a-priori  probabilities  of  the  hypotheses  and  in  the 
signal-to-noise  ratios  at  the  individual  sensor  sites.  We 
develop,  analyze  and  simulate  a  parallel  architecture  with 
local  and  global  binary  neurons  as  the  decision-making 
elements.  These  neurons  adapt  their  weights  and 
thresholds  to  track  the  (unknown)  parameters  of  the 
environment  which  supplies  them  with  data  for  decision 
making. 

Section  II  presents  the  distributed  sensor  configuration 
that  we  study  -  which  considers  two  hypotheses  in  the 
observed  environment,  and  non-communicating  sensors. 
The  configuration  is  relatively  easy  to  implement,  but  is 
suboptimal  with  respect  to  a  minimum  probability  of 
error  performance  criterion.  In  section  HI  we  assume  that 
the  individual  sensors  do  not  have  access  to  the  exact 
statistics  of  the  hypotheses  that  they  examine,  and  that 
the  data  fusion  center  does  not  have  access  to  the  exact 
performance  of  the  individual  sensors  (their  probabilities 
of  false  alarms  and  missed-detections.)  We  present  a 
stochastic-approximation  rule  which  is  used  for  learning, 
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in  order  to  allow  the  system  to  achieve  asymptotically  the 
performance  of  a  system  which  is  fully  informed  of  the 
necessary  probabilities.  Finally,  we  present  in  section  IV 
the  results  of  numerical  studies  of  the  performance  of  a 
five-sensor  target  de lectio*  radar  system,  with  unequal 
and  time-varying  signal-to-noise  ratios  and  a  priori 
probabilities. 


•V 

-Jv-L-t 


°ln  Pr(Hc)  +Km  -T  (3N 

mlnP7H0+“  »•  (3> 


In  this  case  one  can  express  analytically  the  local 
probability  of  error,  as  well  as  the  local  probability  of  a 
missed  detection  and  the  local  probability  of  false  alarm 
(Sage  and  Melsa  (1971),  p.  130): 


The  design  of  a  distributed-sensor  configuration 
for  minimizing  the  probability  of  error,  a  Bayesian  cost 
performance  index,  or  a  Neyman-Pearson  criterion  has 
been  the  subject  of  many  studies  (see  Thomopoulos  et  al. 
1987)  and  Reibmsn  and  Nolte  (1987)  for  a  list  of 
references.)  The  minimum  probability  of  error  criterion 
which  we  are  using  here  is 
on  the  local  level 

Pel  =  P,(Ho)  Pr  (u,  =  1 1  Hq)  +  P^,)  Pr  (u,  *  -1 1  H,) 

(la) 

and  on  the  global  level 

PeC  =  Pt(Ho)Pr(u=l|  Ko)  +  Pf(Hj)  Pr (u  =»  —  1 1  H,) 

(lb) 

The  optimal  local  decision  that  is  necessary  in  order  to 
minimize  P^  does  not  minimize  Pel,  in  general.  It  has 
been  shown  that  the  optimal  local  decision  involves  a 
maximum-likelihood  test  with  a  threshold  whose 
calculation  is  coupled  to  that  of  the  calculation  of 
thresholds  for  alLother  local  detectors.  We  are  using  here 
a  simpler,  and  hence  sub-optimal,  configuration,  where 
each  detector  makes  its  own  local  minimum  probability 
of  error  detection,  rather  than  the  decision  which 
corresponds  to  the  global  minimum  probability  of  error 
decision. 

We  assume  that  the  decision  problem  corresponds  to 
choosing  one  of  two  Gaussian  populations  with  different 
means.  In  other  words  the  detection  system  has  to  decide 
whether  its  i*  observation  zj  is 
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=  m  + v* 


k=  1,2 . K 


where  the  0*'$  are  independent  with  the  probtbdiy  density 

pv(a)=-~«P[-ir  jy*) 

Pro2)!-  ^  20  k»t  J 

^id  K  observations  axe  used  before  a  decision  is  made. 
The  local  minimum  probability  of  error  decision  is 
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Assuming  now  that  each  local  detector  (with  its  o*  :•  ult 
Pm]  and  Pr)  makes  a  decision  according  to  equation  (31, 
the  global  minimum-probability-of -error  decision,  given 
the  heal  decision  rules  is  (Chair  and  Varshney  (1986)): 


Decide u4  =  1  if  ao  +  ajUj  >0 


Decide  U[  =  -1  otherwise 


ao  =  l°g^  (6a) 

1  ,  1  -P.M1  ,  ..  1  .  1  “PF1  /f  , 

3i  =  2  l08“P^“  (u‘ +  +  2  l08_P^T (  } 


The  overall  probabilities  of  missed  detection  and  false 
alarm  can  be  calculated  for  this  system  as  well  -  appendix 
A  gives  the  details.  When  the  noise  variances  at  the  local 
detectors  are  equal,  the  answer  is 
a 

PMO  =  !  -  £  ( ? )  G-Pm)1  Pm1'  U[  Ao  +  a  (2j  -  n)]  (7a) 
j=0  J 

Q 

pR3  =  X  (jH  U[  Ao  +  a  (2j  -  n)]  (7b) 


a  P(Hi)  ,  n,  pM(l-pM) 

A°  ln  P(Hq)  +  2  h  PF  (1-PF)  (7  ) 

,  _i  In  pm  a-pM)  ,7 .. 

T1"  PF(1-PF)  ,(d) 

U(.)  is  the  unit  step  function, 
and  PM  and  PF  have  been  defined  in  equation  (4). 

From  examination  of  equations  (4)  and  (6),  we  note  that 
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at  that  detector,  0  <  aM  <  1. 


the  distributed  decision  system  is  a  two-layer  binary 
neural  network.  The  local  detectors  have  K  real  inputs 
and  one  binary  output  each  (K  -  number  of  observations 
per  decision,)  and  the  data  fusion  center;  n  binary  inputs 
and  one  binary  global  decision  output  (n«  number  of 
sensors.)  Figure  2  depicts  the  overall  system. 


So  far  we  have  assumed  that  the  local  detectors  and  the 
DFC  know  the  a  priori  probability  (P(Ho)),  and  that  the 
DFC  knows  the  probabilities  of  missed  detection  and 
false  alarm  (Pm.  and  Pfi)  of  each  of  the  sensors  which 
feed  it  with  information.  When  these  probabilities  are  not 
available,  we  use  simple  stochastic  approximation  rules 
in  order  to  estimate  them.  These  rules  have  been 
successfully  used  in  other  neural  network  learning 
applications  (e.g.  Kohonen  (1988),  Kam  et  al.  (1988)). 

Approxuiuitwn  of  the  a  priori  probabilities 

At  the  local  detector  level 

i(H0)0t+1)  =  Pi(H/>  +  an  -  P,(H/>)  (8a) 

where  ^(Hq)*  is  the  estimate  of  P(H0)  at  the  i01  local 
detector  at  die  k111  time  step;  and  au  is  the  learning  constant 


At  the  DFC  level 
PcfcmoJ^'^PDrcCHo)01^ 

“fdc  (-J-  -  PDfc(Ho)(1i>) 

where  PdfcCHq)00  is  the  estimate  of  P(H<j)  at  the  DFC 
at  the  k,h  time  step;  and  o^rc  is  the  learning  constant 
°<aDre<  1. 


Approximation  of  the  error  probabilities 
At  the  DFC  level 

Pm.  ^  PM« 00 + 


where  Pm.00  and  PR  are  the  estimates  of  PMi  and 
P  pi  at  the  i*  local  detector  at  the  kmtime  step; 
and  PDFC  is  the  corresponding  learning  constant 
at  the  DFC,  0  <  (3dfc  <  1. 


While  these  approximation  rules  do  exhibit  a.s. 
convergence,  they  nevertheless  are  very  useful  in  many 
applications  that  call  for  on-line  parameter  estimarion.The 
estimated  values  in  equation*  (8)  and  (9)  are  used  in  the 
decision  rules  (3)  and  (5)  -  to  yield  the  global  decision, 
u. 

[V  Example:  a  fiversensor  target  detection  fusion  system 

We  consider  a  fusion  system  for  target  detection  using 
five  unequal  detectors.  The  system  has  been  simulated 
on  an  IBM-AT  using  Turbo  PASCAL  5.0.  The 
presented  results  were  obtained  via  Monte-Carlo 
simulations. 


CASE  T:  Static  environment 

We  have  studied  a  system  of  five  sensors  in  Gaussian 
white  noise  with  signal  to  noise  ratios: 


Sensor  1: 

5.0 

dB 

Sensor  2: 

3.0 

dB 

Sensor  3: 

0.0 

dB 

Sensor  4: 

-3.0 

dB 

Sensor  3: 

-5.0 

dB 

Figures  3a,b  show  the  probability  of  correct  decision  (Pc 
-  1-Pe)  versus  rime,  for  sensors  2,3,  and  5,  along  with 

the  results  for  the  fully-informed  DFC  and  the  learning 
DFC. 


We  observe  that  both  DFC's  perform  on  average  better 
than  each  sensor  alone.  Moreover,  the  learning  DTC 
gradually  approaches  the  performance  of  the 
fully-informed  DFC. 


Figures  4  a,b  show  the  DFC  weights  (a;  in  equation  (6) 
for  sensors  2,3,  and  5.  Weights  are  shown  for  the  cas 
of  a  present  target  (figure  4a)  and  the  case  of  no  target  it 
sight  (figure  4b.)  The  horizontal  lines  are  the  weights  foi 
a  fully-informed  DFC,  the  other  curves  correspond  to  thi 
learning  DFC. 

We  observe  that,  as  expected,  unreliable  sensors  (like  5 
have  lower  weights  than  more  reliable  sensors  (like  2) 
Moreover,  the  learned  weights  converge  to  the  values  ol 
ihe  fully-informed  DFC. 

CAS&JI;  Timff-Vjirvin"  environment .  van/jnp  SNA 

we  now  introduce  an  abrupt  change  (from  -5.0  dB  to 
5  0  fins  .u„ 

...  ,..t  signal  to  noise  ratio  at  sensor  five, 
halfway  into  the  simulation  (after  50  time  steps.)  In 
figure  5a  we  show  one  100-step  run  for  the  probability 
f  correct  decision  for  sensor  5,  the  learning  and 
fully-informed  DFC's.  We  observe  the  expected 


mprovement  in  the  probability  of  correct  decision  for 
both  the  sensor  and  the  DFC's.  Figure  5b  shows  the 
eights  for  sensors  2,3  and  5.  The  weights  of  (the 


2  3  pr°-biiity  °,f  <!orTect  decision  for.  sensors 

full’v^nfo™^  learning  data  fusion  cemer  (L.DFC);  ^ 
iutiy-informed  data  fusion  center  (F.I.DFC). 


Figure  3b:  The  probability  of  correct  decision  for  the  learning 
cemerU(F°inDFC)r  (L>DFC)  *nd  ‘he  fulIy  infoniied  data  fusio" 


sensors  H  ^  c  i  associated  with  the  decisions  of 
data  fusion  ^n,5  n  Cn^.tar2ct  is  P^ent  -  learned  by  the 
data  fusion  center  (FJ.DPC).  ^  f°f  ‘hC  fulIy  informcd 
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Figure  4b:  The  weights,  associated  with  the  decisions  of 
sensors  2,3  and  5  when  the  target  is  not  present  -  learned 
weights  by  the  data  fusion  center  (L.  DFC)  and  fixed  weights 
for  the  fully-informed  data  fusion  center  (F.I.DFC). 


Figure  5a:  The  probability  of  correct  decision  for  sensor  5,  the 
learning  data  fusion  center  (LDFC)  and  the  fully  informed  data 
fusion  center  (F.I.DFC).  A  step  change  in  the  SNR  of  sensor 
5,  from  -5dB  to  5dB,  is  introduced  after  50  time  steps, 
resulting  in  an  improvement  of  the  performance  of  sensor  5  and 
the  DFCs. 


Figure  5b:  The  weights,  associated  with  the  decisions  of 
sensors  2,3  and  5  when  a  target  is  not  present,  learned  by  the 
data  fusion  center  (L.  DFC)  and  fixed  for  the  fully  informed 
data  fusion  center  (F.I.DFC).  A  step  change  in  the  SNR  of 
sensor  5,  from  -5dB  to  5dB,  is  introduced  at  50  time  steps. 


learning)  sensor  5  are  shown  to  converge  asymptotically 
to  the  optimal  a:  weights  of  equation  (6)  (the  horizontal 
lines.) 

CASE  ni:  Timc-varvinp  environment.-  varying  a  nri^* 
Mahahiliiig 

We  now  change  the  a  priori  probability  of  Hq  from  0.3 
to  0.7  halfway  into  the  simulation  (after  50  time  steps.) 
In  figure  6a  we  show  one  100-step  run  for  the 
probability  of  correct  decision  using  the  learning  DFC 
and  the  fully-informed  DFC.  We  observe  that  the 
learning  system  experiences  a  performance  degradation 
after  the  probability  change,  but  is  able  to  recover, 
through  learning,  in  a  short  period.  The  recovery  is  due 
to  the  restoration  of  correct  estimates  for  the  a-priori 
probabilities,  as  shown  for  the  sensors  in  figure  6b.  In 
figures  6c, d  we  show  convergence  to  the  optimal 
weights  both  before  and  after  the  a  priori  probability 
change. 


Figure  6a:  The  probability  of  correct  decision  for  the  learning 
data  fusion  center  (L.DFC)  and  the  fully-informed  data  fusion 
center  (F.I.DFC).  A  step  change  in  the  probability  of  a  target 
being  present  from  0.7  to  0.3  is  introduced  after  50  time  steps, 
resulting  in  a  temporary  degradation  in  performance  of  the 
learning  DFC  (L.  DFC.) 

Conclusion 

Wc  have  considered  a  distributed  detection  system 
whose  performance,  in  terms  of  probability  of  error,  can 
be  calculated  analytically.  We  have  shown  that  the 
system  can  be  realized  as  a  rwn-layer  network  of  binary 
threshold  elements,  and  that  the  weights  of  these 
elements  can  be  estimated,  using  on-line  stochastic- 
approximation  rules  to  form  an  adaptive  detection 
system.  The  results  are  applicable  to  systems  which  use 
a  variety  of  sensors  located  in  separated  geographical 
locations  for  decision  making.  Further  research  is 
necessaty  in  order  to  determine  an  adaptation  scheme  for 
the  learning  constants:  high  learning  constants  encourage 
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Figure  6b:  The  probability  of  a  target  not  being  present,  learned 
locally  by  sensors  2,3  and  5,  compared  >wiih  the  actual  value 
biown  to  the  fully-informed  data  fusion  center  (F.I.DFC).  A 
step  change  in  the  the  probability  of  a  target  being  present,  from 
0.7  to  0.3,  is  introduced  after  50  time  steps. 


Figure  6d:  The  weights,  associated  with  the  decisions  of 
sensors  2,3  and  5  when  a  target  is  present  -  learned  by  the  data 
fusion  center  (L.  DFQ  and  fixed  for  the  fully-informed  data 
fusion  center  (F.I.DFC).  A  step  change  in  the  probability  of  a 


time  steps. 

X  =  (Ab  +  lZ(A|-Bi>]  + 

N 

IX  2{Al  +  Bi>  Ju' 

I-t  1 

(A2) 

where 

a  P(Hl)  .  1  V.  PMid-PMi) 

^  lnP(H0)  +  2  2>  PFi(l-PFi) 

(A3-1) 

a  ,  >-pMi 

Aj  =  ln  p 

“fi 

(A3 -2) 

I  -PFt 

Bj  =  In  -=  — 

‘Mi 

(A3  -3) 

Figure  6c:  The  weights,  associated  with  the  decisions  of 
sensors  2,3  and  5  when  a  target  is  not  present  -  learned  by  the 
data  fusion  center  (L.  DFC)  and  fixed  for  the  fully-informed 
data  fusion  center  (F.I.DFC).  A  step  change  in  the  probability 
of  a  target  being  present,  from  0.7  to  0.3,  is  introduced  after  50 
ume  steps. 

ihe  quick  learning  of  changes,  but  suffer  from  high 


We  consider  first  the  case  of  equal  noise  variances  at  the  local 
detector.  All  detectors  use  the  same  detection  scheme,  have  the 
same  threshold  and  the  same  probabilities  of  missed  detections  and 
false  alarms. 


variance  in  the  estimates.  Small  learning  constants 


PM!  “  PM2 


PMn  =*  PM 


require  long  settling  times.  The  solution  •  adaptive 
learning  constants  -  should  imitate  adaptation 
mechanisms  in  similar  systems,  e.g.  delta  modulation  in 
digital  communications. 

A:  Pctfonwr  of  ihr  distributed  sensor amfigurarifla 
e  overall  expression  for  the  sufficient  statistic  of  the  Chair  and 
varshney's  (1986)  DFC  is 

N 

X  =  a«+Xaiul  (Al) 

kl 

where  a^  and  ^  have  been  defined  in  equation  (6).  This  expression 
can  be  re-written  as 


PFI  51  PF2  *■•••=  Py„  =  Pf 


Let: 


P(  u(  »  +1  |  Hk)  =  p* 
P(Uj=»-l  |Hk)  =  1-  pk  =  qk 
where  k  =  0,1 
Let  x  be  the  random  variable 

t  o  \ 


1 

X=2 


Xu,  +  n 


\i»l  J 

(  x  is  the  number  of  l’s  in  the  observation  vector  at  the  DFC.) 
Clearly, 

-j 


P(x=j|Hk)  =  (jn)piqJ- 


(A4) 


11-63 


j  =  0,  1 . n. 

Equal  variances  We  assume  now  that  the  noise  variance  is  the 
same  for  all  sensors. 

Under  this  condition,  ail  coefficients  of  u1  are  the  same  and  equal 
to 


Pr, !  h0(Yi  I  Ho)  =  PF|  5  + 

( i  -  Pf,  )  s  (y, 

due  to  statistical  indeperJei.ie  of  the  terms, 

Pa,  Ikc(^i  I  H0)=  prj(7)  *  pns(Y)  *  -  *  Pm(7>  (AlO) 


a 


1  Pm(1~Pm/ 
2m  PF(1-PF) 


X  now  has  the  probability  density 
PAlH.tf-lHk)3' 


(A5) 


We  consider  the  set  of  2°  binary  tuples  that  car.  appear  as  an  input 
vector  to  the  DFC.  We  use  the  index  k»  0,  1,  .  .  2M  to 
distinguish  between  the  different  tuples.  The  probability 

distribution  of  X,  under  assumption  Ho  is 


£  (")ptqTJ  5  (X  -  (Ao  +  a[2j-n]))  (A6) 


2*-l 

Pa,  I  K„(*t  I  Ho)  “  2^  Wk  I  Ho  5(z  “  ®k) 

b.0 

where 


(All) 


wlxre  5(«)  is  the  Dirac  delta  function. 

Thu  global  probabilities  of  missed  detection  and  false  alarm  can 
now  be  found  by  integrating: 

PMO«£pA|H,fllH,)dX 

3  If")1  -  Pm^Pm1  U[  a(n-2i)  -  Ao)  (A7-1) 
j-o  J 

PFQ  =  {*PA|H#WH0)dX 


(A7-2) 


(A8-1) 


(A8-2) 


Non-equal  variances  Wc  present  an  algorithm  to  find  the  error 
probabilities  for  the  case  of  non-equal  noise  variances  of  the 
detectors.  We  consider  first  the  second  term  in  equation  A- 1: 

Q  n 

*-i=2X i(Ai + )uf = X  “ft 

i=l  i*l 

with 


l+Dy 


1-llH 


i»l 


=  na-pR) 


!+uU 

J  l“l 


p-l 

and  uu  is  the  i*  bit  of  the  klh  tuple, 
and 


COfc 


i-l 


A,+B, 


-ukj 


(A  12) 


(A  13) 


To  find  the  probability  of  false  ala  m,  we  shall  sum  all  weights 


Wk  such  that  their  corresponding  >  -Aq.  The  calculation  of  the 
probability  of  missed  detections  follows  a  similar  path. 
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NEUROMORPHIC  ADAPTIVE  CONTROL 
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ABSTRACT 

A  neuromorphic  unsupervised  parallel  distributed  adaptive 
controller  for  a  class  of  nonlinear  systems  is  proposed.  It  is  shown  to 
provide  bounded  tracking  and  asymptotic  regulation  following  a  class 
of  'teacher  models.'  A  double  link  planar  manipulator  example  is 
provided  for  a  simulative  illustration  of  the  properties  of  the 

1.  PROBLEM  FORMULATION 
An  unsupervised  distributed  parallel  computing  architecture  is 
proposed  for  the  adaptive  control  of  nonlinear  dynamic  systems  of 
the  class 

x(t)  =  A(x)x(t)  +  B(x)u(t)  (1) 

y(t)  =  C(x)x(t)  +  D(x)u(t)  (2) 

with  some  degree  or  other  of  uncertainty,  where  x(t)e  Rn  is  the  plant 

state  vector.  y(t)e  Rm  is  the  output  vector,  and  u(t)e  Rm  is  the  input 
command  vector,  and  where  A(x).  B(x),  C(x),  and  D(x)  are 
uniformly  bounded  matrices  of  corresponding  dimensions.  This 
representation  includes  the  linear  time  invariant  systems  and  allows 
for  extentions  of  the  following  presentation  to  certain  classes  of 
nonlinear  systems  like  manipulators. 

Figure  1  presents  a  schematic  illustration  of  a  common  problem 

in  modem  control.  If  perfect  knowledge  on  the  parameters  ©  of  the 
linear  plant  is  available,  one  has  many  ways  to  design  his  "best" 
controller  represented  here  by  the  fixed  gains  K.  Since  perfect 

knowledge  is  not  available,  in  general,  one  uses  some  estimate  ©  of 

A 

the  parameters  of  the  plant  to  design  the  "best"  controller  K,  based 
on  this  estimate.  The  controller  K  must  control  the  real-world  plant 
©,  and  it  is  not  necessarily  a  good  contro'ler  for  this  real  plant.  Still, 

A 

there  is  some  domain  where  any  fixed  gains  K  would  guarantee,  at 
least,  stability  of  the  control  system.  However,  in  order  to  achieve 
better  performance,  modem  control  usually  employs  time-varying 
nonlinear,  adaptive,  or  intelligent  algorithms  to  compute  dynamic 

estimates  of  the  parameters  ©(t)  of  the  plant,  or  directly  the 
parameters  of  the  controller,  which  are  thus  nonstationary  or 

nonlinear  functions  K(t).  But  before  mentioning  performance,  how 
can  we  guarantee  stability  of  these  nonstationary  systems.  Intuitively, 
one  may  assume  that  if  the  nonstationary  gains  leave  the  stability 
region  that  was  established  for  the  fixed  gains,  one  might  get  into 
troubles.  Unfortunately,  stability  with  nonstationary  gains  is  not 
guaranteed,  in  general,  even  if  one  is  cautious  and  does  impose  upon 
the  variable  gains  the  bounds  of  stability  that  were  established  for  the 
fixed  gains  K  [1]. 

Therefore,  nonstationary  or  nonlinear  controllers  must  usually 
assume  slow  variation  of  the  estimated  parameters  and  of  the  control 
gains  (which  then  makes  them  "almost"  fixed).  Most  adaptive 
methods,  in  particular,  require  external  persistent  excitation  (to 
guarantee  convergence  of  the  parameters),  and  assume  the  prior 
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knowledge  or  an  upper  bound  on  the  order  of  the  unknown  controlled 
plant.  Prior  knowledge  on  the  pole-excess  in  the  plant  is  also  needed. 
This  knowledge  is  needed  for  design  and  then  used  to  implement 
observer-based  controllers  of  the  same  order  as  the  plant  (2). 


BOUNDARY  OF 
STABILITY  WITH 
FIXED  GAINS 


Figure  1 :  The  modem  control  problem. 

We  propose  a  nonlinear  neuromorphic  adaptive  controller  that 
maintains  stability  under  the  same  bounds  that  would  guarantee 
stability  with  fixed  linear  controllers.  The  proposed  controller  is 
depicted  in  Figure  2  below:  It  consists  of  a  teacher  model  which 
incorporates  the  knowledge  regarding  the  desired  input/output 
plant  response  as  well  as  'he  repertoir  of  reference  commands  that 
the  system  may  be  subjected  to. 


REFERENCE  CONTROL  OUTPUT 


Figure  2:  Proposed  Neurocontroller 

The  teacher  dynamic  model  is  assumed  to  have  the  following 
representation: 

x, (t)  =  A,(x,)x,(t)  +  BtCx^UtO)  (3) 

y. (t)  =  C,(x,)x,(t)  +  D,(Xt)U[(t)  (4) 

where  x,(t)e  R"1  is  the  state  vector,  y,(t)s  Rm  is  the  output  vector,  and 

u,(t)e  Rm  is  the  input  command  vector,  and  where  A,(xt),  Bt(xt), 
C,(x,),  and  D,(xt)  are  uniformly  bounded  matrices  of  corresponding 
dimensions.  It  is  emphasized  that  the  dimension  of  the  model  is 
unrestricted,  except  that  dim(y,)  =  dim(y)  =  m. 


The  parallel  distributed  adaptive  controller  has  structure  similar  to 
the  LMS  adaptive  layer  [3],  and  to  many  other  neurocontroller 
architectures  [4].  It  receives  the  input  'features'  vector  f(ut,  x,,  v) 
and  generates  as  an  output  the  process  control  vector 

utt)  =  K(t)f(u„  x„  y)  (5) 

where  K(t)  is  the  adaptive  gain  matrix  of  appropriate  dimension. 
Each  Kjj  gain  monitors  the  sensitivity  of  the  i-th  control,  namely  up 
to  the  j-th  feature  of  the  system,  namely  fj(u,,  x,,  y)  and  may  be 
viewed  as  the  state  of  the  i/'-th  neuron  (Fig.  3). 


Figure  3  •  Neurocontroller  Architecture 
a)  General  Block  Diagram  b)ij-th  Neuron  Architecture. 

The  Kjj  gain  adjusts  its  value  independently  of.  and 
simultaneosly  with  all  other  gains,  according  to  the  adapnve  rule: 

K,/t)  =  Mjj(t)  +  Ni/t) 

Mij(0  =  aij(y[.-yi)fj  <7‘ 

^Nijd)  =  -  f3iJNij(t)+  Yij(yH  -  yi  )fj  (8) 


For  the  above  described  control  architecture  we  use  recent  results 
by  Bar-Kana  [5, 6]  to  show  that,  under  some  realistic  assumptions, 
large  classes  of  nonlinear  plants  of  the  form  (l)-(2)  with  adaptive 
controllers  of  the  form  (5)-(8)  can  perform  good  trajectory  tracking 
and  also  guarantee  robust  adaptive  stabilization  in  the  presence  of  any 
bounded  input  commands  and  input  or  output  disturbances. 
Furthermore,  this  adaptive  controller  has  ben  shown  to  have  good 
graceful  degradation  (  or  fault  resistant  )  properties  with  control 
failure  or  fast  changes  of  plant  parameters  {7]. 

MAIN  RESULTS 

We  describe  below  a  summary  of  our  main  results  regarding  the 
performance  of  the  controller  in  (6),  (7),  and  (8).  We  will  assume 
that  if  a  plant  is  stabilizable,  the  resulting  stable  configuration  is 
"exponentially  stable"  as  defined  below: 

DEFINITION  1  (Hahn)  (8):  Let  the  general  nonlinear  system 

be  represented  by  the  n  th-order  vectorial  equation  x  (t)  =  f(x,t)  and 
let  x=0  be  an  equilibrium  point.  The  equilibrium  point  is  called 
"exponentially  stable"  if  all  solutions  x(t)  satisfy  the  relation  lx(t)l 

<  a  lx(0)l  e'®5'  for  some  scalars  a  >  0  and  (3  >  0. 

THEOREM  1  [8]:  Let  the  right  hand  of  the  equation  x  (t)  = 
ffx.t)  have  bounded  continuous  first  order  partial  derivatives.  Let  the 
equilibrium  be  exponentially  stable.  Then  there  exists  a 
Lyapunov  function  V(x,t)  which  satisfies  estimates  of  the  form 


1.1  a,lx(t)l2SV(x,t)Sa2lx(t)l2 


1.2 

1.3 


V(x,  t)  <  -  ctj  lx(t)l2 
1^4  <  a.  lx(0l! 


i=l,2, . n 


for  ct|,  a2,  a3,  and  a4  positive  constants. 

Because  we  deal  with  nonlinear  systems  we  cannot  expect  to 
show,  like  in  linear  time-invariant  systems,  existence  of  positive 
definite  quadratic  Lyapunov  functions  of  the  form  V(x)  =  xT(t)Px(t) 
where  P  is  constant  and  positive  definite.  However,  after  some 
experience  with  specific  nonlinear  systems  like  robots,  and  because 
we  restrict  our  discussion  to  nonlinear  systems  linear  in  control  of  the 
form  (l)-(2).  we  assume  that  exponential  stability  of  the  autonomous 


system  (1),  with  utt)  s  0,  implies  existence  of  Lyapunov  functions 
V(x)  which  are  not  explicit  functions  of  time.  Also,  since  then  we  can 

always  write  V(xj  =  xT(t)P(x)x(t)  and  thus  tyx)  =  xT(t)|Rx)  + 


P(\)A(x)+AT(x)P(x)]x(t)  or  \(x)  =  -  xT(t)Q(x)x(t),  we  will  use  in 
the  subsequent  discution  the  following  assumption: 

ASSUMPTION  1:  Exponential  stability  of  the  autonomous 

system  (1)  (with  u(t)  =  0)  implies  existence  of  Lyapunov  functions 


of  the  form  V(x)  =  x‘(t)P(x)x(t)  and  derivative  of  the  form  \(x)  =  - 
xT(t)Qtx)x(t).  where  P(x)  and  Q(x)  are  positive  definite  for  all  x  s 
After  establishing  the  basic  definitions  and  facts,  we  can  start 
presenung  the  properties  of  the  neuromorphic  adaptive  controller.  A 
first  result  will  illustrate  the  stabilizing  properties  of  the  main  term  of 
the  nonlinear  adaptive  controller  that  we  propose.  To  this  end.  let  us 
,1'Mime  that  the  plant 


x(t)  =  A(x)x(t)  +  B(x)u(t) 


(9) 


where  an,  (Jn  and  y,.  are  positive  constants.  We  emphasize  that  the 
adaptation  law  (6)-(8)  is  similar  in  structure  to  the  Widrow-Hoff  rule 
(3)  modified  with  a  momentum  term.  We  will  show  later  that  the 
modified  structure  also  includes  a  supplementary  term  which  takes 
care  of  the  stabilization  of  unstable  systems.  The  adaptation  is 
performed  in  parallel  and  in  a  distributed  fashion,  that  is.  the  Kjj 
gain  only  needs  data  from  the  j-th  feature  and  i-th  output 
components.  Thus,  very  large  scale  dynamic  systems  may  be 
considered  for  this  controller.  The  adaptive  gams  consist  of  two 
terms:  a  "proportional"  term,  Mjj(t),  and  an  "integral"  term.  Njj(t). 
Notice  that  the  role  of  the  teacher  (3j-(4)  is  to  demonstrate  to  the 
adaptive  controller  what  should  be  the  appropriate  and  desired 
response  yt  for  any  specified  reference  input  Uj.  Since  the  teacher's 
model  is  incorporated  in  the  controller  structure  (Figure  2).  it  yields 
the  so-called  ’unsupervised'  learning  or  adaptation. 


v(t)  =  C(x)x(t)  (10) 

can  be  stabilized  by  some  constant  output  feedback.  In  other 
words,  the  resulting  (fictitious)  closed-loop  sys'.em  is 
exponentially  stable  according  to  assumption  1.  Let  us  define  the 

ya(t)  =  y(t)  +  Dud)  (11) 

where  D  =  K'y\  and  use  the  adaptive  algorithm 

u(0  =  -K(t)y.(t)  (12) 

with  the  integral  adaptive  gain  K(t)  =  N(t)  given  by 

N(0  =  y»(t)y[(t)r  (13) 


(where  T  is  a  selected  positive  definite  scaling  matrix)  or  (for  each 
scalar  gain) 


a?N,-T,jV* 

(14) 

THEOREM  2:  The  simpfc  algorithm  (12)-(14)  guarantees 
boundedness  of  all  values  involved  in  the  adaptation  process, 
namely,  states,  outputs,  and  adaptive  gains,  and  asymptotically 
perfect  regulation  for  the  augmeafcd  system 

x(t)  =  A(x)x(t)  +  B(x)u(t) 

(15) 

y(t)  =  C(x)x(t) 

(16) 

y.(t)  =  y(t)  +  Du(t)  =  C(x)x(t)  +  Du(t) 

(17) 

such  that  y(t)  ->  0  and  y,(t)  ->  0  as  t  ->  The  gains  are  bounded 
and  ultimately  reach  constant  values  (Appendix  A).  It  is  worth 
mentioning  that  Ky  can  be  any  stabilizing  constant  gam.  Actually, 
some  estimate  of  the  maximal  admissible  gam  is  sufficient  to 
guarantee  stability  of  the  adaptive  controller,  with  no  dependence  on 
the  rate  of  variation  of  the  gains. 

We  want  to  use  the  stabilizing  nonlinear  algorithm  (12),  in 
combination  with  other  adaptive  terms  in  order  to  implement  a  stable 
trajectory-following  adaptive  algorithm.  Let  the  teacher  generate  the 
desired  trajectories  that  the  plant  must  follow.  Let  the  feature  vector 

f(ut,  x„  y)  =  [  (y,  -  y)T,  xj,  u[  ]T  (18) 

where  f(u,,  x,,  y)  uses  all  values  that  can  be  measured,  like  the  input 
commands,  the  teacher’s  states,  and  the  tracking  errors.  We  could  try 


to  use  the  adaptive  algorithm 

Ki/t)  =  M,j(t)  +  Njj(t) 

(19) 

Mt,(t)  =  ot,j(yij  -  yj  )fj 

(20) 

=  )fj 

(21) 

for  perfect  tracking  in  idealistic  environments.  However,  we  are 
aware  that  (19)-(21)  must  further  be  adjusted  to  be  applicable  in 
realistic  environments.  It  is  clear  that  the  perfect  integrator  in  (21) 
increases  without  bound  whenever  perfect  tracking  is  not  possible. 
Since  the  nonlinear  system  includes  uncertainties  that  we  do  not 
assume  to  know  or  identify,  and  since  moreover,  input  and  output 
disturbances  may  usually  be  present,  we  do  not  try  to  proof  perfect 
tracking,  which  could  be  obtained  m  some  ideal  situanon,  but  rather 
concentrate  onrobust  stability  under  nonideal  conditions.  By  "robust 
stability"  we  mean  boundness  of  all  values  involved  in  the 
adaptation  process  like  states,  adaptive  gains  and  errors,  and  tracking 
with  arbitrarily  small  tracking  errors.  Perfect  tracking  in  idealistic 
conditions  is  a  particular  case  of  the  general  robust  tracking  under 
nonideal  conditions.  The  only  modification  of  (19M21)  needed  to 
guarantee  robustness  of  the  adaptive  system  is  the  addition  of  the  by¬ 
pass  term  -p^N^t)  to  the  right  term  of  (21).  which  gives  the 
complete  adapuve  algorithm  (5)-(8).  It  can  be  shown  that  under  the 
assumptions  of  theorem  2,  the  adapuve  algorithm  (Sh8)  guarantees 
robust  stability  of  the  system  (15)-(17)  [9], 

It  is  also  worth  mentioning  that  we  assumed  output  stabtlizability 
via  constant  feedback  only  for  a  simple  introduction  of  parallel 
feedforward  .  Let  us  assume  now  that  the  plant  needs  some  general 
dynamic  configuration  to  reach  stability.  Specifically,  if  G(0  is  some 
nonlinear  system  of  the  form  (9)-(  10)  and  if  H:  (Af,  Bf„  Cf„  Df.)  is 
a  stabilizing  controller,  the  adaptive  algorithm  (5M8)  guarantees 
robust  stability  of  the  augmented  system  G,(-)  •  G(-)  +  H'1  [9], 
More  important,  when  nonphysical  improper  linear  controllers  H  are 
desireded  to  control  the  nonlinear  plant,  we  can  use  their  proper 
inverse  in  parallel  with  the  plant.  This  way,  we  only  use  the 
knowledge  on  the  existence  of  an  improper  controller  and  actually 
use  a  proper  configuration  in  parallel  with  our  plant.  Specifically,  as 
in  the  case  of  nonlinear  robotic  manipulators,  PD  controllers  of  the 
form  H(s)  =  K(l+qs)  can  stabilize  the  manipulators,  and  if  K  is  very 


large,  we  can  get  very  good  tracking  in  ideal  situation.  However,  in 
practice  we  do  not  want  to  employ  either  differentiators  or  high  gains 
in  control  loops  and  indeed,  the  adaptive  controller  only  uses  the 
knowledge  on  their  mere  existence,  to  implement  simple  first  order 


poles  of  the  form  H'1(s)»  in  parallel  with  the  plant,  where  D-> 
K'1  can  be  a  very  small  gain. 

Notice  that  we  do  not  guarantee  any  more  that  the  plant  is 
perfectly  tracking,  because  the  best  we  can  obtain  is  yt(t)  »  y(t)  + 


Du(t)  -»  y,(t),  although  we  actually  want  y(t)  — >  y,(t).  Still,  if  the 
maximal  admissible  gain  is  large  compared  with  the  gain  of 
the  plant,  then  we  can  use  D=K^J,  and  get  y,(t)  =  y(t)  +  Du(t)  -  y(t). 
For  a  quite  common  example,  assume  that  the  gain  of  the  plant  is  10 
and  the  maximal  stabilizing  gain  K,,,,,  =  100.  Then  we  use  D  = 
1/100  =  0.01  in  parallel  with  10  and,  as  the  illustrations  bellow  and 
the  references  show,  y,(t)  -  y(t)  for  all  practical  purposes. 


ROBOTIC  EXAMPLE 

In  this  section  we  apply  the  proposed  neurocontroller  (5)-(8)  to 
position  control  of  a  double  link  planar  manipulator  [10].  This 
example  contains  all  major  dynamical  components  such  as  gravity, 
Coriolis  and  centripetal  terms. 


Figure  4  -  Double  link  Manipulator 


Figure  4  describes  the  double  link  system  configuration.  The 
equations  of  motion  are: 

u,  =  DnS,  +  D1282  +  +  26,62)  +  D,  (22) 

Uj  —  Dj2^l  +  D22^2  "  Ddf  +  D2  (23) 


where 

L,  =  0.1  m,  L2  =  0.1  m,  m]  =  0.6  kg,  m2  =  0.6  kg,  g=10  m/s2, 

Djj  =  (ni|+  n^L,  +m2L2  +2m2L]L2Cos©2, 

2 

Dj2  =  +  m2L1L2Cos02, 

D22=  m2L2 , 

D  =  -m2L|L2Sin02. 

D|=(m]+  m2)gL1sin01+m2gL2sin(01+02), 

D2  =  m2gL2Sin(0]+  ©2). 

and  where  g  is  gravity,  u,  and  u2  arc  torques  at  the  first  and  second 
joints,  m[  and  m2  are  masses,  and  L|  and  i^are  lengths  of  the  first 
and  second  link  correspondingly. 

We  only  employ  posmon  measurements  and  the  outputs  of  the 
plant  are 


yi  =©i 

V;  =©2 


For  the  teacher  we  used  the  simple  decoupled  model: 


*(0  = 


--25 

0  - 

.  0 

-25. 

(24) 

(25) 


(26) 


and  the  adaptive  control  system  was  tested  with  demanding  square- 
wave  input  commands.  In  paralel  with  the  plant  (22)-(25)  we  employ 
the  supplementary  feedforward  (5, 1 1] 


and  the  adaptation  coefficients  were 

ctij  =  71)  =  100000.;  pjj  =  0.01  (29) 

Results  of  simulations  are  shown  in  Figure  5.  Figure  5a 
compares  the  first  plant  and  model  output,  and  figure  5b  shows  the 
second  output.  Figure  5c  shows,  for  ilustration,  the  behavior  of  the 
adaptive  gains  Ku(t)  and  KjjW-  It  can  be  seen  how  the  adaptive  gain 
moves  up-and-down  in  accord  with  the  specific  operational  needs. 
Figure  5d  represents  the  norm  of  all  plant  states,  to  show  that  no 
hidden  state  diverges.  All  values  remained  bounded  while  the  plant 
tracked  with  small  errors,  although  we  used  only  position  (no 
velocity  and  no  acceleration)  output  measurements. 


Figure  5.  Example:  (a)  Output  y,  (t),  yt(t); 

(b)  Output  y,2(t),  y2(t);  (c)  Gains  Ku(t),  Kii(t); 

(d)  Norm  of  all  plant  states. 

CONCLUSIONS 

This  paper  presents  a  ncuromorphic  computing  architecture  for 
adaptive  control.  Starting  with  some  prior  assumptions  about 
stabilizability  of  the  plants  it  results  in  a  stable  unsupervized 
architecture.  The  feasibility  of  the  method  is  demonstrated  on  an 
example  of  robotic  manipulator. 

APPENDIX  A  •  PROOF  OF  THEOREM  2 

It  is  easy  to  see  that  by  using  the  controller 

(A.l) 


with  (15M17),  we  get  the  closed-loop  system  (Figs.  6  and  7) 
x(t)  =  Ac(x)x(t)  +  Bctou^t)  (a.2) 


y»(t)  =  Cc(x)x(t)  +  Deue(t) 

(A.3) 

where 

Ac(x)  =  A(x)  -  B(x)K«.(x)C(x) 

(A.4) 

Kee(x)  =  Ke[I  +  DKt]->  =  [I  +  KeD)->Ke 

(A.5) 

Bc(x)  =  B(x)[I  +  KtDl-» 

(A. 6) 

Cc(x)  =  (I  +  DKt)-iC(x) 

(A.7) 

Dc  =  ( I  +  DK^  ]-,D  *  D[  I  +  lyD  ]•' 

(A. 8) 

where  D  >  0  and  Dc  >  0,  which  means  that  D  and  Dc 

are  positive 

definite. 


Figure  6:  The  closed-loop  configuration. 


DEFINITION  2.  The  closed-loop  system  (A.2)  -  (A. 3)  is 
called  strictly  passive  (.SP)  if  it  sausfies  the  following  relations 

P(x)  +  P(x)Ac(x)  +  Aj(x)P(x)  =  -Q(x)  -  LT(x)L(x)  (A.9) 
P(x)Be(x)  =  Cj(x)  -  LT(x)W  (A.10) 

DC  +  DJ  =  WTAV  (A.l  1) 

for  some  uniformly  bounded  positive  definite  matrices,  P(x)  and 
Q(x),  and  some  matrices  L(x)e  Rmxn  and  We  Rmim,  and  where 


Up(t)  =  -K^y/t)  +  upc(t) 


fy^dPWaPWdx 

"W  dt  a*  <F ' 

DEFINITION  3:  The  augmented  open-loop  system  (13)  -  (17) 
is  called  "almost  strictly  passive"  (ASP)  if  there  exists  a  positive 
definite  static  feedback  matnx  Kt  (unknown  and  not  needed  for  any 
implementation)  such  that  the  resulting  closed-loop  system  (A.2)- 
(A.3)  is  strictly  passive  (SP). 

LEMMA  1:  If  Ky  stabilizes  (9)-(10),  then  the  augmented  open- 
loop  (15)-(17)  is  "almost  stricdy  passive"  (ASP)  (6). 


The  lemma  and  the  various  definitions  will  be  used  for  the  proof  of 
Theorem  2.  The  adapdve  control  system  of  (15)-(17)  with  the 
adaptive  controler  (12)-(13)  is 


y,(t)  =  C(x)x(t)  +  Du(t)  =  C(x)x(t)  -  DN(t)yt(t)  (A.  12) 

y.(t)  =  C(x)x(t)  -  D[N(t)  -  KJy,(t)  -  DK.y.O)  (A.  1 3) 

yt(t)  =  ( I  +  DK*  ]  *C(x)x(t)  -  ( I  +  DIQ  ]  >D(N(t)  -  KJy,(t)(A.14) 


y»(t)  =  Cc(x)x(t)  -  Dc[N(t)  -  KJyt(t)  (A.  1 5) 

x(t)  =  A(x)x(t)  -  B(x)N(t)y,(t) 

=  A(x)x(t)  -  BfxJK^y.O)  +  B(x)Key,(t)  -  B(x)N(c)ya(t) 

=  A(x)x(t)  -  BWICAWxW  +  B(x)KLDr[N(t)  -  KJyt(t) 
-B(x)[N(t)-KJy,(t) 

=[A(x)  -  B(x)KecC(x))x(t)  -Bc(x)[N(t)-KJyt(t) 

=  AcfxJxO)  -  Bc(x)[N(t)-KJy,(t)  (A.  16) 


We  must  prove  stability  of  all  the  values  involved  in  the  adaptation 
process,  like  states,  outputs,  gains.  Let  us  select  the  quadratic 
Lyapunov  function 

V(t)  =  xT(t)P(x)x(t)  +  tr[(N(t)  -  Ke)r'(N(t)  -  Kf)T]  (A.  17) 

where  t r  denotes  trace  and  where  K,.  denotes  the  unknown  ideal  gain 
that  makes  the  plant  strictly  passive..  Notice  that  the  second  term  in 
(A.  17)  is  only  a  short  notanon  for  the  sum  of  all  terms  of  the  foim 

[N  (t)  -  KjVft..  Therefore,  (A.  17)  is  a  positive  definite  quadratic 
function  of  all  states  and  adaptive  gains.  The  derivative  of  the 
Lyapunov  function  along  the  trajectories  of  (13)  and  (A  16)  is 


V(t)  =  xT(t)P(t)x(t)  +  xT(t)P(x)  i(t)  +  xT(t)P(x)x(t) 

+  tr((N(t)  -  Kjr'^Ml+trfflMrVNO)  -  Ke)T] 

=  xT(t)[  P(t)  +  P(x)Ae(x)  +  Aj(x)P(x)]x(t) 

-  xT(t)P(x)Bc(x)[N(t)-KJyf(t) 

-  yI(t)(N(t)-KJTBj(x)P(x)x(t) 

+  trt(N(t)  -  Ke)r'rys(t)yl(t)] 

+  triy.Wylwrr'lNd)  -  Ke)T]  (A.  13) 


By  using  (A.9)-(A.  1 1)  we  get 
V(t)  =  -  xT(t)Q(x)x(t)  -  xT(t)LT(x)L(x)x(t) 

-  xT(t)Cl(x)[N(t)-KJyt(t)  -  yI(t)[N(t)-Ke)TCc(x)x(t) 

-  xT(t)LT(x)W[N(t)-Kjyt(t)  -  yI(0[N(t)-KJTWTL(x)x<i) 

+  xT(t)CT(x)[N(t)-Ke]yt(t)  +  yT(t)[N(t)-KeJTCc(x)x<t) 

-  yJt)[N(t)-K,]TDj(N(t)  -  Kc]y4(t) 

*  y»(0[N(t)-KJTDc(N(t)  -  KJy,(t) 

=  -xT(t)Q(x)x(t)  -  xT(t)LT(x)L(x)x(t) 

-  xT(t)LT(x)W[N(t)-igy1(t)  -  yI(t)[N(t)-Ke!TWTL(vw.(;i 

-  yI(t)(N(t)-KJTWTW[N(t)  -  KJy,(t) 

=  -xT(t)Q(x)x(t) 

-  (L(x)x(t)  -  W[N(t)  -  Kely,(t))T(L(x)x(t)  -  W(N(t)  •  KJy4(t)l 

SO  (A. 19) 


Notice  that  we  only  claim  that  the  derivative  of  the  Lyapunov 

function  in  (A.  19)  is  positive  semidefinite,  namely  \{t)  2  0.  The 
selected  positive  definite  quadratic  form  of  V(t)  then  guarantees  that 
all  states  and  adaptive  gains  are  bounded.  The  trajectories  of  the 


adaptive  control  system  finally  reach  the  region  defined  by  V  (t)  a  0 

[13].  It  is  easy  to  see  that  V(t)  a  0  implies  x(t)  a  0.  Therefore,  if 
we  ignore  the  adaptive  gains,  the  adaptive  control  system  is 
asymptotically  perfect  regulator,  but  what  about  the  gains?  First,  we 

proved  already  that  the  gains  are  bounded.  Second,  x(t)  a  0  implies 

y(t)  h  0  and  y,(t)  a  0,  which  implies  N(t)  s  0  and  then  N(t)  = 
constant.  Therefore,  N(t)  ultimately  reaches  some  constant  value 
such  that  the  plant  is  perfect  regulator,  as  the  theorem  claims. 
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a n st R A Cl 

In  classical  model-based  techniques,  construction  of  a  control 
architecture  capable  of  accurately  tracking  a  desired  trajectory 
throughout  the  state-space  requires  a  detailed  knowledge  of  the 
controlled  system’s  dynamics.  However,  the  system  dynamics  are 
rarely  fully  known  at  the  time  of  controller  design.  In  this  paper, 
we  modify  our  earlier  ncurocontrollcr  architecture  [1]  so  as  to 
guarantee  the  performance  of  the  neurocontroller.  An  innovative 
aspect  of  this  architecture  is  in  the  use  of  a  priori  knowledge  of  the 
general  structure  of  the  system's  dynamics.  The  knowledge  is 
utilized  for  the  selection  of  Exploratory  Schedules  (ES)  to  excite 
selected  subsets  of  the  dynamics.  The  controller  does  not  require  a 
priori  knowledge  of  the  exact  system  dynamics,  as  they  are  learned 
online.  Nor  does  it  assume  the  existence  of  an  explicit  external 
teacher.  The  developed  control  architecture  is  also  not  limited  to 
tracking  of  a  prespecified  trajectory.  The  architecture  is  developed 
through  application  to  the  control  of  a  robot  manipulator. 

i  in’TROOUCLIQN 


A  number  of  different  neural  network  models  and  neural 
learning  schemes  have  been  applied  to  system  controller  design 
with  varying  degrees  of  success  |2|.(3],[4],[5],(6],[7],(8],[9]).  In 
general,  the  work  surveyed  assumed  very  little  a  priori  knowledge 
of  the  structure  of  the  open  loop  system  and  in  none  of  the  cases 
was  there  proof  of  stability  of  the  closed  loop  system.  In  system 
controller  design,  proof  of  stability  of  the  closed  loop  system  is 
essential.  Furthermore,  if  ncurocontrollers  are  to  successfully 
compete  with  currently  available  controllers,  they  must  be  shown  to 
have  performance  that  is  at  least  comparable  if not  superior.  In  this 
work  we  incorporate  a  priori  knowledge  of  the  plant’s  dynamics  in 
the  neurocontroller  design.  It  is  shown  that  assuming  a  dynamic 
model  for  the  plant  allows  the  design  of  a  neurocontroller  with 
guaranteed  performance.  The  dynamic  model  used  for  the 
development  of  the  neurocontroller  is  that  of  a  rigid  robot 
manipulator. 

Selection  of  training  examples  to  provide  efficient  learning  is 
also  an  issue  which  is  rarely  addressed.  In  this  article,  training 
example  selection  for  ncurocontrollers  is  provided  with  the 
introduction  of  Exploratory  Schedules  (ES).  ES  are  trajectories 
which  are  designed  so  as  to  produce  examples  which  allow  the 
neurocontroller  architecture  to  learn  efficiently  by  exciting  selected 
subsets  of  the  dynamics. 
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A  general  block  diagram  of  the  proposed  system  for  a  robot 
manipulator  is  shown  in  figure  1.1 .  It  consists  of  the  Exploratory 
Schedule  Generator,  Neurocontroller  and  associated  Learning 
Algorithm,  the  Robot,  and  a  mechanism  which  allows  selecting 
between  the  User  /  Path  Planner  and  ES  Generator  as  the  originator 
of  the  desired  trajectory  (qd). 


figure  1.1:  Block  diagram  of  the  proposed  system. 

In  section  2,  a  priori  knowledge  of  the  incompletely  known  plant 
dynamics  are  utilized  such  that  the  closed  loop  system  consisting  of 
the  Neurocontroiler,  Learning  Algorithm,  and  Robot  can  be  shown 
to  asymptotically  track  a  desired  trajectory.  Furthermore,  a  priori 
knowledge  is  also  used  to  select  exploratory  schedules  which  can 
be  shown  to  provide  asymptotic  identification  of  the  dynamics  that 
are  not  a  priori  known.  Section  3  gives  results  of  simulations  for  a 
2  DOF  manipulator.  Section  4  concludes  the  article. 

2.  Nftirnrontrollcr  DetiVn  for  n  Rim'd  Robot 
Manipulator 


Robot  Dynamic  Yfmtpl 

The  closed  form  dynamics  obtained  by  the  Lagrange-Euler 
formulation  has  the  general  matrix  form  of  [  10J 

D(q)q  +  C(q,q)q  +  G(q)  =.  t  (2.1) 

where 

D(q)  is  an  n  x  n  matrix  of  ineru'al  terms, 

C(q',q)  is  an  n  x  n  matrix  of  coriollis  and  centrifugal  terms  , 

G(q)  is  an  n  x  1  vector  of  gravitational  terms, 
q  is  an  n  x  1  vector  of  joint  coordinates, 
t  is  an  n  x  1  vector  of  forces/torques, 
n  is  the  number  of  degrees  of  freedom. 


The  formulation  results  in  n  second-order,  coupled  differential 
equations.  (2.1)  can  also  be  expressed  as 

t  -  D(q)q  +  C(q,q)q  +  C(q)  -  Y(q,q,q,q)0  (2.2) 

where  Y(q,c],q,q)  is  an  n  x  p  matrix  of  known  functions  ard  0  is 
an  p  x  1  vector  of  weighting  constants. 

N’eurncontroller  Architecture 

When  (2.2)  is  written  in  terms  of  the  individual  torques  at  each 
joint,  it  can  be  viewed  as  a  single  layer  linear  network,  where  the 
inputs  to  the  network  are  the  Y;j(.)  and  the  weights  are  the  0j 

Note  that  the  Y|j(.)  are  transcendental  algebraic  functions  of  the 
manipulators  states  that  are  a  priori  known  and  may  be  realized  via 
feedforward  neural  networks  that  are  trained  offline  with  a  suitable 
learning  algorithm  (i.e.  BEP  [12]). 

Let  q^,  3d  designate  the  desired  trajectory.  Following  [13] 
define  the  virtual  reference  trajectory  qf,  qr,  and  the  virtual 
trajectory  velocity  error  4, 

4f-$d*Aei  ^"qa-Ac,  ef  «q*qr=*  i  +  Ac  (2.3) 

where  e[t]  -  q[t]-q,j[t]  is  the  joint  coordinate  error,  and  A  is  a 
positive  definite  matrix  with  constant  coefficients. 

The  output  of  the  neurocontroller  implements  the  control  law 

P 

V  ZY|jtq.4,Mr]0J  +  KdaV  . n  ,(2.4) 

j  *  1 

where  0  denotes  the  estimate  of  0,  and  ths  \  <  are  constant 
weights  for  the  servo  portion  of  the  controller .  il.te.v.ion  (2.4)  has 
been  shown  to  be  asymptotically  stable  when  ti  e  r.g  rale 

n 

^-•iV-Yjjtq.q.q^q^e  ,  j  •=  !  2.  .  p  .  (2.5) 

i  -  1  aH 

is  used  [13]. 

Notice  in  equation  (2.5)  the  similarity  to  the  IMS  lr..m..ng  rule 
(see  [14])  where  the  weight  change  is  prcjwb'eml  -o  :!-e  error  t 
and  the  input  features  X.  In  equation  (2.5i  (h#  ki^ui  ‘  .res  >re 
the  Yy(.)  functions  and  the  error  is  ef. . 

Figure  2.1  shows  the  internal  structure  y  ‘he  nruiuc  tna-o.ler 
for  joint  i.  Each  feedforward  network  module  .'s  t^.ne-J  :d  provide 
oneofthe  Yjj[q,q,<jt,qf]  functions.  This  train, ne  ,..m  be  done 
offline  since  the  Yjj[q,q,qf,qr]  functieni  .ire  known  a 
priori  and  are  the  same  for  all  rigid  robots  ■  f  the  same 
kinematics  and  number  of  degrees  i ■  'Ver  lem.  The 
inputs  to  the  neurocontroller  are  the  components  o!  t-e  -ra;ec tor.es 
as  required.  The  outputs  of  the  neurocontroller  .-e  the  err-.-ol 
torques  to  be  applied  at  each  joint  of  the  manipu'etcr. 


Selection  nf  F.xnlorntorv  Schedules 

Theorem  115] 

If  p  £  n,  and  if 

Rank  I  Y(  Md^d-^  Up 
I  lim  t->  •• 

then  the  controller  (eq.  (2.4),  (2.5))  guarantees  global  asymptotic 
tracking  of  the  desired  trajectory  and  identification  of  the  weights. 

An  implication  of  the  theorem  is  that  if  p  >  n,  a  sufficient 
condition  for  k  in  weights  to  be  identified  is  that  they  are  are  not 
in  the  null  space  of 

YCqdq4.qd>^ 

lim  t->  •• 

Which  implies  that  a  desired  trajectory  may  be  chosen  such  that 
columns  corresponding  to  the  k  S  n  weights  are  linearly 
independent,  which  will  guarantee  that  the  weights  are  identified. 
The  exploratory  schedule  then  consists  of  a  sequence  of  desired 
trajectories  which  are  designed  to  leam  different  components  of  the 
parameter  estimation  vector  8,  where  the  number  of  desired 
trajectories  is  such  that  all  p  components  of  0  are  identified. 

SIMULATION  RESULTS 

Dynamic  Model  for  Manipulator  with  2  DOF 
The  2  DOF  manipulator  is  shown  in  figure  3.1 

y 


figure  3.1  Simulated  2  DOF  manipulator. 


^Jlul„iii|;  the  link  mass  to  be  concentrated  at  the  tip  of  each  link, 
the  dy«!ti»ics  ofthe  2  D0F  manipulator  are 

+  D12q2  +  2Cq,q2 +Cq22  +  G!  (3.1) 

t2-l)|2^1+D22^#C(^tJ  +G2 

where 

D|  |  **  m(  liJ  +  m2 1  j2  +•  m2 122  +  2  m2  lj  12  cos(q2), 

Djj  m2 122  +  m2 1 1 12  cos(q2), 

D22  -  !22  • 

C  •  n'2  ’l  *2  s'n^2^ 

0,  -  K(m|  +  m2  )t,  sin(qt)  +  g  m2 12  sinCq,  +  q2) 

C2»  B  m2=l2sin(qj  +q2) , 

with  m|  -  m2  -  10.0,  lt  - 12  -  1.0,  g  -  9.81 . 


Exploratory  schedule  for  the  2  HOF  MnnintilaMr 

In  the  2  DOF  case,  p>n,  and  at  most  2  weights  can  be 
guaranteed  to  be  teamed  simultaneously.  An  exploratory  schedule 
was:  selected  such  that  different  portions  of  the  schedule  would 
provide  for  learning  different  weights. 

The  selection  of  the  exploratory  schedule  manipulator  was 
accomplished  by  noting  that  columns  4  and  5  of  the  Y  matrix  in 
(3.2)  are  functions  only  of  position,  therefore  selecting 

3dl-3d2“4dl"4d2"0,0*  and  qdj,  qd2  such  that  columns  4  and  5  are 
nonzero  as  e2>  e2,  ej,  i2  •>  0  will  ensure  that  weights  04  and  0$ 
are  identified.  Next,  selecting  qdj-q^-O.O,  and  q^,  qd2,  q^,  4c 
such  that  column  3  is  nonzero  in  the  limit  ensures  identification  of 
weight  02.  Then  selecting  q^t.  q<i2  such  that  columns  1  and  2  are 
nonzero  ensures  the  identification  of  weights  04  and  0j  .  The 
actual  exploratory  schedule  is  as  shown  in  figures  3.2  to  3.4 . 


Which  can  be  written  as  the 


where 

Y,,-qu 

Y2'“°* 

Y)2“Y22“‘il+‘l2' 

Yjj  -  cos(q2)(2qi+q2)-sin(q2)(2qlq2+ q22), 
Y23-cos(q2)<ii+sin(q2)qi2, 

Y,4-sin(qi). 

Yi4-°. 

Yjj  ■  Y2j  ■  sin(qi+q2). 


The  true  valucsof  the  weights  are 


0,  .  (m,+m2)  lj2  -  20.0  04  -  g(m,-+=m2  )1,  -  196.2 

02  m  m2 122  ■  10.0  0j  *  g  m2 12  *  98.1 

0j  ■  in2  l|.l2  ^  • 

'Hie  control  law  as  specified  by  eq.  (2.4)  is 


P 

V  2Y1j(q,q,q,.qf)0j  +  Kdiier.i  (3.3) 

j  • 1 

where  c,.  -  cj+ej,  Kdn  *  1000,  »  529.  and  the  learning 

algorithm  asidefined  by  (2.5)  was  used  to  update  the  weight 
estimates  0.  with  the  adaptation  constants  a  9.001. 

Nonlinear  feedforward  network  modules  were  tmmed  offline  using 
the  BEP  learning  algorithm  [12]  to  approximate  sin(x)  and  x*y, 
where  x  and  y  denote  the  inputs  to  the  network.  These  modules 

were  llten  combined  to  implement  the  Yjjtcq.q.q^  qf]  functions 

in  (3.3). 


figure  3.2:  Desired  joint  coordinates. 
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figure  3.3:  Desired  joint  velocities. 
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figure  3.4:  Desired  joint  accelerations. 

theoretically,  a  weight  is  guaranteed  to  be  identified  when 
ei>e2-4i-i2<,0*0*  However,  in  practice,  due  to  noise, 
disturbances  or  the  inability  to  wait  until  alt  transients  have  died 
out,  exact  tracking  may  not  occur  and  small  tracking  errors  may 


lead  to  small  errors  in  the  identification  of  the  weights.  In  this 
simulation,  a  weight  is  said  to  be  identified  when  Maximumfie^, 
!c:l  |c2l)  Se,0j,  where  e,oI  -  0.01,  ar.d  a  further  stipulation  that 
attempted  identification  of  each  weight  is  to  last  at  least  3  seconds. 
Furthermore,  errors  in  weight  identification  may  occur  due  to  the 
feedforward  network  modules  only  approximating  the  Yij(.) 
functions. 

The  time  period  corresponding  to  identificau'on  of  04  and  0j  is 
0  S  t  <  4.2.  As  can  be  seen  in  figures  3.5  to  3.8,  the  weight 
estimation  error  for  04  and  0j  tends  to  aero  as  all  joint  errors  fall 
below  0.01  at  t-4,2,  but  does  not  reach  zero.  The  time  period 
corresponding  to  identification  of  0j  is  4.2  S  t  <  3.3.  Again,  the 
weight  estimation  error  for  0j  approaches  zero  as  all  joint  errors 
approach  zero  at  t»8;3 .  weights  0(  and  0j  are  identified  during 
the  time  period  8.3  <  t  S  10 . 


figure  3.5:  Joint  coordinate  error. 


figure  3:6:  Joint  velocity  error  d^rrg. 


figure  3.7:  Estimation  error  for  weights  1,2,  and  3. 


figure  3.3:  Estimation  error  for  weights  4  and  5. 


4.  Conclusion 

We  have  shown  that  assuming  a  dynamic  model  for  the  plant 
allows  the  design  of  a  neurocontroller  with  guaranteed 
performance.  A  neurocontroller  has  been  proposed  which  utilizes 
available  a  priori  knowledge  of  the  plant  dynamics  to  train  a 
number  of  feedforward  network  modules.  The  output rof  these 
modules  were  combined  online  in  an  output  layer  to  provide  the 
control  torques  for  a  robot  manipulator. 

A  procedure  for  the.design  of  exploratory  schedules  for  a  2 
DOF  manipulator  was  presented.  Theoretically,  the  exploratory 
schedule,  when  used  in  conjunction  with  the  proposed 
neurocontroller  would  guarantee  identification  of  the  dynamics  that 
are  not  known  a  priori.  However,  as  shown  by  the  results  of  the 
simulations,  some  error-in  the  identification  does  occur.  This  error 
is  mainly  attributed  to  the  approximations  provided  by  the 
feedforward  networks.  In  previous  work  [15],  the  control 
••■"'.ecture  was  employed  using  exactly  computed  values  of  the 
Vi/.)  functions  rather  than  the  approximations  provided  by  the 
network  modules.  In  that  case,  identification  of  the  unknown 
dynamics  was  accomplished  using  the  same  exploratory  schedule 
as  presented  here. 
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ABSTRACT 

The  application  of  a  neural  network  approach  to  solution  of  the  EMG-torque 
relation  in  the  ankle  joint  under  isometric,  supine  conditions  is  addressed  in  this 
study.  The  backpropagation  neural  network  model  was  used  in  this  analysis  to 
simulate  a  multilayer  perceptron  for  solution  to  the  muscle  EMG  •  joint  torque 
mapping.  Measurements  from  6  muscle  sites  are  entered  into  the  model  as  the  input, 
while  the  torque  is  entered  into  the  model  as  the  ideal  output,  to  which  the  model 
output  is  compared.  The  incoming  muscle  signals  may  be  considered  the  "intent"  of 
the  system,  while  joint  torque  is  the  "controlled"  variable.  It  is  expected  that  the 
results  of  this  study  will  enable  the  application  of  neural  network  models  as  the 
adapter  (processor,  controller)  for  intent  recognition  systems  in  modelling  intact 
human  motor  control  about  one  joint  and  potentially,  for  applications,  such  as  in 
multi-joint  control  of  robotic  manipulators,  myoelectric  control  of  prosthetic  devices, 
and  control  of  functional  electrical  stimulators. 


A.  INTRODUCTION 

In  biological  organisms,  movement  is  controlled  through  an  amalagam  of  voluntary,  stereotypic, 
and  reflex  actions.  The  particular  mixture  for  any  specific  movement  appears  to  be  based  on  the 
functional  task  (goal)  to  be  performed  or  the  intent  desired.  Basic  movement  patterns  (strategies) 
appear  to  be  the  result  of  a  complex  control  system  that  interacts  with  sensory  feedback  mechanisms  to 
produce  movement  which  adapts  to  perform  the  functional  task  at  hand.  Biological  organisms  are  real 
time,  highly  parallel  and  distributed  systems  which  have  adaptive  ability,  learning  capability,  and  are 
governed  by  organizing  principles. 


Research  partially  supported  by  Grant  AFOSR  #890010  and  Calhoun  Research  Endowment.  We 
gratefully  acknowledge  R.  Triolo,  Research  Director  and  the  staff  of  the  FNS  Laboratory  at  Shriners 
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The  development  of  devices  in  both  robotics  and  in  prosthetics  for  amputees  have  been  similarly 
constrained  by  a  lack  of  understanding  of  movements  perfotmed  by  biological  organisms.  The  human 
amputee  and  paraplegic  offer  an  important  test  bed  from  which  an  understanding  of  the  human  control 
of  limbs  can  be  studied.  It  is  clear  in  the  case  of  prosthetic  and  orthotic  devices  that  movements  must  be 
anthropomorphic;  however,  for  robotic  systems,  anthropomorphic  operation  is  not  necessary,  although 
it  is  usually  assumed  since  the  function  of  robots  is  seen  to  be  a  substitution  for  the  human  operator.  If 
a  device  (either  biological  or  robotic),  is  to  function  in  other  than  a  pre-programmed  mode,  the 
communication  between  the  user  and  the  device  must  be  accurate  and  reliable. 

The  problems  that  exist  in  achieving  well  designed,  functional  limbs  are  therefore  related  to  our 
primitive  knowledge  of  how  human  limbs  operate.  In  both  cases,  the  question  arises  concerning  the 
basis  upon  which  to  build  the  controller  of  the  limb  motion.  The  need  exists  to  develop  a  control 
scheme  for  a  multi-joint  limb  based  upon  anthropomorphic  mechanisms. 


B.  BACKGROUND 

During  the  past  20  years,  several  prosthetic  limbs  have  been  developed  which  utilize  spatial 
patterns  of  signals  from  muscles  which  remain  after  a  limb  amputation.  In  addition,  the  employment  of 
electromyographic  (EMG)  signatures  for  controlling  functional  electrical  stimulation  (FES)  in 
paraplegics  in  a  manner  that  recognizes  and  executes  the  patient's  intended  limb  functions  and 
compensates  for  muscle  fatigue  is  currently  being  developed.  The  control  concept  is  that  by  sensing 
the  activity  of  the  muscles  which  would  have  supported  the  movement  of  the  missing  or  paralyzed 
limb,  the  character  of  the  intended  movement  of  the  prosthesis  or  orthosis  can  be  determined.  Sensing 
of  muscle  activity  is  achieved  by  recording  the  appropriate  muscle  (EMG)  signals.  This  scheme  has 
resulted  in  some  partial  success  in  devices  such  as  the  "Temple  Ann"(32),  the  "Swedish  Hand"  (2), 
the"Case  Western  Reserve  Arm"  (23)  and  the  "Utah  Arm".  In  each  case,  the  pattern  of  EMG  activity 
was  used  to  classify  an  intended  limb  movement.  Later  developments  have  included  the  "Drexel/Moss 
Knee"  (27,33),  Graupe's  model  of  EMG  as  a  scalar  time  series  (8,9),  and  Hogan  and  Mann's  model 
of  muscle  activity  using  maximum  likelihood  methods  (15,16).  Also,  Saridis  has  developed  an  upper 
limb  prosthesis  based  on  pattern  recognition  (29). 

In  each  of  these  developments,  assumptions  have  been  made  concerning  the  relationship  between 
EMG  activity  and  force  developed  about  a  particular  joint.  Either  implicitly  (15,16)  or  explicitly 
(25,19),  it  is  assumed  that  a  surface  EMG  signal  can  be  modelled  as  band  limited  white  noise 
modulated  by  the  level  of  muscle  contraction.  According  to  this  assumption,  using  the  EMG  signal  as 
an  indirect  measure  of  muscle  force  requires  processing  the  raw  EMG  signal  to  extract  the  level  of 
contraction  from  the  band  limited  white  noise. 

Under  the  limited  condition  of  isometric  contraction  (i.e.,  muscle  contraction  without  movement), 
there  are  disputes  among  investigators  concerning  the  dependency  of  muscle  force  on  processed  EMG. 
Some  investigators  have  suggested  a  linear  relationship  (19,25,31),  while  others  have  reported  relations 
of  higher  order  (24, 34,  36,37).  Moreover,  in  some  previous  work,  we  have  shown  that  most  of  the 
variability  in  reported  results  comes  from  the  variability  in  isometric  muscle  contraction  under  identical 
conditions  rather  than  from  the  type  of  signal  processing  used  in  analysis  (30).  The  question  of  the 
proper  relationship  between  EMG  and  force  becomes  even  more  complicated  when  non-isometric 
contractions  are  allowed. 


C.  NEURAL  NETWORK  MODELLING  BACKGROUND 

Neural  network  models  derive  their  principles  from  neural  systems  and  seek  to  attain  similar 
capabilities  in  artificial  devices  such  as  VLSI  collective  decision  circuits.  These  models  have  parallel 
inputs,  outputs,  and  internal  computations.  The  models  are  composed  of  computational  elements  or 
nodes  that  are  connected  by  weights  and  are  adapted  or  trained  during  use  to  improve  performance.  An 
underlying  theme  of  the  neural  network  concept  is  that  the  functional  units  that  govern  behavioral 
adaptation  are  distributed  patterns  across  a  network  of  cells.  In  distributed  models,  the  strength  of 
patterns  of  activity  over  many  units  determines  the  degree  of  participation  of  these  entities  in  functional 
events. 


Neural  net  models  are  commonly  used  as  pattern  classifiers  for  extracting  and  classifying  features. 
These  models  can  generate  spontaneously  useful  global  computational  functions  such  as  classification, 
optimization,  and  control.  The  ability  of  these  models  to  capture  non-linear  mappings  has  been 
documented  (28).  The  computational  properties  of  NN  have  been  successfully  applied  to  applications 
such  as  sensory/  motor  control  (7, 20, 21), robot  control  (4,10,12,26),  associative  retrieval  of 
information  (3,17,22,);  process  control  (12, 35), feature  detection  (6),  combinatorial  optimization  (18); 
and  adaptive  pattern  recognition  (5,12,14). 

The  promising  features  of  a  neural  based  approach  to  adaptive  control  of  myoelectric  prosthesis, 
orthotic  devices  or  robotic  manipulators  include  1.  control  laws  that  are  not  explicitly  stated  since 
learning  occurs  by  showing  examples;  2.  fault  tolerance  provided  by  massive  parallelism;  3. 
robustness  to  unmodelled  parameters  due  to  the  network’s  generalization  properties;  and  4.  abundance 
of  local  minima  in  the  networks  state  space  used  for  content  addressability  and  retrieval  of  information 
under  noise  and  uncertainty  (1 1,12,13). 

In  this  study,  we  address  the  feasibility  of  designing  a  "neural  network"  which  will  obviate  the 
need  to  specify  the  EMG-force  relationship  apriori.  If  the  neural  network  succeeds  in  classifying  the 
force  output  from  a  general  set  of  EMG  input  signals,  then  we  will  have  a  solid  basis  for  our 
expectation  that  a  complete  multi-joint  limb  can  be  controlled  using  a  neural  network  approach.  The 
outcome  will  be  a  more  anthropomorphic  limb  which  will  benefit  the  fields  of  prosthetics,  electrical 
stimulation,  and  robotics. 


P.  THE  BACKPROPAGATION  NEURAL  NETWORK  MODEL 

The  neural  network  backpropagation  model  (NN  model)  was  used  in  this  analysis.  The 
backpropagation  model  simulated  a  multilayer  perceptron  for  solution  of  the  muscle  EMG-joint  torque 
mapping  under  isometric  supine  conditions.  Measurements  from  6  muscle  sites  are  entered  into  the 
model  as  the  input,  while  the  torque  is  entered  into  the  model  as  the  ideal  output,  to  which  the  model 
output  is  compared  (Figure  1).  Data  from  an  architecture  composed  of  6  inputs,  two  hidden  layers, 
with  six  nodes  per  hidden  layer ,  and  one  output  was  implemented.  Symmetric  sigmoid  nonlinearities 
were  used  in  this  model.  Continuous  analog  inputs  and  outputs  were  assumed.  The  learning  rate  and 
momentum,  two  parameters  of  the  model,  were  initially  0.4  and  0.8,  respectively.  The  de aired  output 
torque  ranged  from  -1200  (sub-maximal  plantarflexion )  through  +1000  (maximal  dorsiflexion)  N-m. 
The  calculated  and  desired  outputs  were  normalized  between  -0.9  and  +0.9.  The  inputs  were  not 
scaled;  however,  these  values  all  ranged  from  0.0  through  +8.0  volts  (rectified,  filtered  EMG  signals). 

During  training,  each  presentation  to  the  NN  model  was  composed  of  6  input  values  (muscle 
signals)  and  1  output  value  (joint  torque)  at  0  degree.  One  sweep  of  the  data  of  the  network  represented 
4000  presentations  (training  signal).  The  4000  presentations  were  repeatedly  presented  to  the  NN 
model  until  the  weights  stabilized  and  the  error  (i.e.,  measured  joint  torque  output-calculated  (model) 
joint  torque  output)  reached  a  minimum.  Within  the  training  signal  there  are  represented  ail  levels  of 
torque.  The  data  presented  during  training  is  composed  of  a  data  file  that  is  pre-sorted  by  joint  torque 
levels.  The  NN  is  trained  by  being  shown  all  input  combinations  under  isometric,  supine  conditions. 
The  relation  between  the  EMG  signals  and  torque  is  embedded  within  the  network;  i.e.,  the  hidden 
units,  and  connections  (or  weights).  If  the  architecture  is  large  enough  to  store  the  EMG  -joint  torque 
relations  and  if  training  has  occurred  for  an  adequate  amount  of  time ,  then  it  can  be  expected  that  test 
signals  composed  of  muscle  EMG  that  are  presented  to  the  network  will  predict  the  corresponding  joint 
torque.  Successful  prediction  of  joint  torque  during  operation  from  any  set  of  EMG  inputs  under 
isometric  supine  conditions  is  expected.  Therefore,  it  is  predicted  that  a  successful  training  method  can 
be  implemented  in  training  the  EMG-joint  torque  mapping  with  the  use  of  this  NN  approach.  During 
the  prediction  (or  retrieval)  stage,  signals  are  presented  to  the  trained  network.  They  are  presented  to 
the  NN  only  once  for  prediction,  since  no  type  of  training  occurs  during  this  stage  of  die  analysis. 
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FIGURE  1.  NEURAL  NETWORK  ARCHITECTURE  6-6-6-1 


E.  EXPERIMENTAL  METHOD 

The  experiment  was  performed  in  the  FNS  Laboratory  of  the  Philadelphia  Unit  of  Shriners 
Hospitals.  A  KinCom  Robotic  Dynamometer,  6  surface  Electromyography  (EMG)  electrodes  and 
EMG  amplifier  system,  MicroVax  n  minicomputer  and  MicroVax  laboratory  system  for  data  acquisition 
were  utilized.  Data  analysis  was  performed  with  the  use  of  a  PDP 1 1-73  minicomputer. 

The  KinCom  II  Robotic  Dynamometer  (Chattecx  Corp)  is  a  microcomputer-based  system  which 
records  the  joint  angular  position  and  torque  developed  about  the  joint  upon  muscular  contraction. 
With  the  use  of  the  KinCom  computer  monitor,  subjects  visually  monitored  to  the  requested  torque 
level.  The  torque  output  and  muscle  signals  were  collected  on  the  MicroVax  II  computer  for  each 
torque  level.  The  data  presented  is  with  the  ankle  joint  set  to  0  degree. 

Six  sites  of  muscle  activity,  monitoring  the  agonist/antagonist  muscle  activity  about  the  ankle 
joint,  were  recorded  on  the  subject's  right  leg  using  surface  EMG  electrodes.  Tne  muscle  sites  were 
tibialis  anterior,  peroneous  longus,  flexor  digitorum  longus,  gastrocnemius-medial, 
gastrocnemius-lateral,  and  soleus.  The  EMG  electrodes  (Motion  Control)  contain  a  preamplifier 
(gain-200).  The  signals  were  full  wave  rectified  and  filtered  appropriately  (Band-pass  100-1000  Hz). 
Sampling  rate  was  100  Hz.  The  signals  were  amplified  to  allow  minimum  and  maximum  range  (across 
all  conditions)  to  fall  within  the  computer  limits  of  ±10  V. 

A  MicroVax  n  minicomputer  was  used  for  data  acquisition.  Sampling  rate  for  all  channels  (joint 
angle,  joint  torque, «  muscle  sites)  was  100  Hz.  Data  were  transferred  to  a  PDP  - 11/73  minicomputer 
for  analysis. 


F.  EXPERIMENTAL  PROCEDURE 


This  experiment  was  designed  to  measure  the  isometric  ankle  EMG-torque  relations  at  the 
measured  range  of  torque  output  ( -1200  to  1000  N-m)  during  isometric,  supine  conditions.  The 
subject  was  instructed  to  lie  supine  on  the  clinical  bed.  The  right  knee  was  braced  and  the  foot  was 
securely  fit  into  a  boot  which  was  attached  to  the  KinCom  II  device.  The  thigh  was  securely  strapped 
down  so  that  body  motion  was  prevented  during  the  tasks.  The  rotational  axis  of  the  system  was 
aligned  to  the  lateral  malleolus  of  the  subject's  foot. 

Under  the  control  of  the  KinCom  II  system,  torque  levels  in  both  plantarflexion  and  dorsiflexion 
at  0,50,100,200,400,600,  800,  1000,1200  N-m  were  measured  (note  that  the  maximum  dorsiflexion 
torque  was  1000  N-m  and  -1200  N-m  was  sub-maximal  plantarflexion  torque).  The  subject  was 
instructed  to  generate  the  designated  plantarflexion  or  dorsiflexion  torque.  Subjects  started  from  a 
relaxed  position  before  every  muscle  contraction.  The  subjects  maintained  the  muscle  contraction  for  5 
seconds  and  then  relaxed  again.  Adequate  rest  was  provided  between  each  contraction  as  well  as 
between  each  data  set  The  subject  was  provided  visual  information  of  the  designated  torque  output  by 
the  KinCom  II  system  monitor. 

G.  RESULTS 

The  NN  was  trained  by  being  shown  all  possible  examples  of  the  EMG-torque  relation  of  the 
ankle  joint  under  isometric,  supine  conditions  with  respect  to  torque  levels  (for  both  dorsiflexion  and 
plantarflexion  torques).  Trials  of  dorsiflexion  and  plantarflexion  signals  were  windowed  and  100 
samples  (representing  1  sec  of  data)  of  each  EMG  signal  from  each  torque  level  were  extracted  for  the 
training  set  The  tourf  number  of  samples  (presentations)  in  the  training  set  was  4000.  The  training 
set  is  sufficiently  sorted  with  respect  to  torque  level  that  good  training  did  result. 

The  results  that  follow  describe  a  NN  implementation  with  an  architecture  of  6  inputs  (EMG 
signals),  2  hidden  layers  (each  6  nodes),  and  1  output  (joint  torque).  The  number  of  iterations  during 
training  was  120,000.  This  represents  30  sweeps  of  the  full  training  set  to  the  network.  The  learning  • 
rate  and  momentum  were  0.4  and  0.8,  respectively.  Once  learning  has  begun  to  occur  at  approximately 
16000  presentations  (see  Figure  2),  these  parameters  were  decremented  to  achieve  a  more  finer 
learning.  Figure  2  depicts  the  error,  i.e.,  actual  joint  torque  (o)  -  calculated  torque  (+),  as  a  function  of 
the  number  of  presentations.  Note  that  by  16000  iterations,  learning  has  clearly  begun,  as  evidenced  by 
the  reduction  in  the  error  to  less  than  6  percent. 
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PRESENTATIONS 


Figure  2  represents  the  error  during  the  training  phase  of  the 
backpropagation  model. 


Figures  3a  and  3b,  represent  the  same  100  presentations  (of  4000)  of  the  sorted  training  set  at  1000 
(during  the  1st  sweep  of  the  training  set )  and  1 17,000  (during  the  30th  sweep  of  the  training  set) 
iterations.  The  activity  of  the  6  muscle  sites  represent  the  input,  while  the  torque  represents  the  output 
For  the  output,  the  actual  torque  levels  are  symbolized  by  o,  while  the  calculated  torque  levels  (during 
training)  are  symbolized  by  +.  When  the  NN  has  adequately  learned  the  EMG-torque  relation  (at 
1 17,000),  the  actual  and  calculated  (via  the  NN)  torque  levels  are  similar.  Comparing  Figures  3a  arid 
3b,  note  that  at  1000  presentations,  the  actual  and  calculated  torque  levels  are  quite  dissimilar  -  learning 
had  not  yet  begun  to  occur. 
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FIGURE  3.  Figures  3a  and  3b  represent  100  presentations  of  the  sorted  training 
data  at  1000  (during  the  1st  sweep  of  the  training  set)  and  at  117,000  (during  the 
30th  sweep  of  the  training  set)  presentations. 


Figures  4a  and  4b,  represent  some  typical  presentations  during  operation  to  the  previously  trained 
network.  Essentially,  the  muscle  signals  are  input  feedforward  to  the  network.  Accuracy  is  determined 
by  comparing  the  calculated  torque  (+)  (via  the  NN)  which  may  be  compared  with  the  actual  torque  (o). 
Figure  4a  represents  random  input  during  operation,  while  Figure  4b  represents  a  signal  input , 
composed  of  three  torque  levels. 
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FIGURE  4.  During  operation,  data  are  presented  feedforward  to  the  trained 
network.  Figure  4a  represents  a  random  input  of  various  torque  levels.  Figure  4b 
represents  a  signal  input,  composed  of  three  torque  levels. 
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H.  CONCLUSION 


The  model  requirements  based  on  the  current  approaches  to  multi-joint  (as  well  as  single  joint) 
motion  are  complex,  computationally  overtaxing,  require  apriori  constraints  and  simplifications  and  are 
not  predictive  in  all  cases.  It  appears  that  through  evolution  and  in  development  that  the  CNS  has 
learned  solutions  (whether  via  inverse  kinematics,  inverse  dynamics,  trajectory  planning  ,  or  task 
dependent  strategies)  for  the  control  of  multi-joint  limb  and  trunk  motion  to  achieve  goal -directed 
movements  in  an  organized,  efficient  and  adaptive  manner.  Economy  of  control  would  suggest  that 
stereotypic  movement  patterns  should  be  available  from  a  library  of  goal-directed  solutions  to  motor 
problems  to  accomplish  a  specific  task.  These  learned  solutions  may  decrease  the  computations 
required  by  the  CNS  for  optimal  achievement  of  the  intended  movement  goal. 

The  application  of  the  perceptron  as  an  adaptive  system  for  robotic  manipulator  control  was  first 
suggested  by  Albus  (1)  in  the  CMAC  (Cerebellar  Model  Articulation  Controller),  a  model  based  on 
neurophysiological  theory  of  cerebellum  function.  This  theory  suggests  that  an  input  to  the  cerebellum 
will  compute  an  address,  in  which  the  contents  of  the  address  are  the  appropriate  muscle  control  signals 
required  to  carry  out  the  intended  movement  In  the  CMAC  model,  control  functions  for  many  degrees 
of  freedom  of  a  manipulator  operating  simultaneously  are  computed  by  referring  to  a  table  rather  than 
by  mathematical  solutions  of  simultaneous  equations.  In  perceptron  based  controller  models  (such  as 
CMAC  and  backpropagation),  the  memory  management  techniques  take  advantage  of  the  continuous 
nature  of  a  control  function  by  allowing  similar  inputs  only  to  generalize  to  produce  similar  outputs  and 
dissimilar  inputs  to  produce  dissimilar  (independent)  outputs.  For  this  application,  the  neural 
network  backpropagation  model  may  be  described  as  a  multilayer  perceptron  feedforward  type  of 
controller.  The  incoming  muscle  signals  may  be  considered  the  "intent"  of  the  system,  while  joint 
torque  is  the  "controlled"  variable.  Embedded  within  the  trained  NN  are  the  mapping  of  the 
EMG-torque  relationship  over  the  full  range  of  torque  levels.  Essentially  this  mapping  may  be. 
considered  the  library  of  learned  solutions  of  the  EMG-torque  relations  of  the  ankle  joint  under 
isometric,  supine  conditions  (at  0  deg)  (Figure  5).  As  the  incoming  muscle  signals  are  applied  to  the 
trained  network,  the  library  of  solutions  (i.e.,  via  the  look-up  table  of  weights)  is  addressed  for  the 
corresponding  output  variable  (torque). 
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Figure  5.  Neural  network  modelling  for  EMG  to  Torque 
relation. 


Effective  understanding  and  modelling  of  biological  motor  control  will  aid  in  achieving  similar 
characteristics  for  real-time  control  of  complex  systems.  The  application  of  neural  learning  algorithms 
in  the  controller  for  intent  recognition  systems  such  as  robotic  manipulators,  myoelectric  prostheses  and 
functional  electrical  stimulators  may  be  appropriate.  In  fact,  for  applications  such  as  these,  it  is  desirable 
to  have  an  adaptive  controller  that  learns  to  control  the  system  while  in  operation.  This  is  achieved 
through  heirarchy  of  control  and  through  variations  of  the  learning  procedure  of  the  neural  net 
backpropagation  model  which  desires  state  information  rather  than  error  gradients  (26).  Based  on  the 
results  obtained  above,  specific  recommendations  will  be  made  concerning  the  design  of  adaptive 
controllers  for  multijoint  prostheses,  orthoses,  and  anthropomoiphic  automata. 
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ABSIBACX 

In  this  article,  we  explore  the  relationship  of  learning  and 
adaptation  in  robot  control.  Learning,  in  this  context,  is  the 
process  of  identifying  the  robot  dynamics  and  its  interaction  with 
the  environment  for  the  purpose  of  improved  tracking  over  an 
infinite  horizon.  Whereas,  adaptation  is  the  process  of  adjusting 
the  controller  to  comply  with  the  regulation  and  tracking  needs 
of  the  closed  loop  system.  We  thus  demonstrate  that  learning 
conflicts  with  adaptation  in  its  tendency  to  increase  the  present 
tracking  error,  due  to  the  minimization  of  different  criteria  (Dual 
control  principle). 

Exploratory  Schedules  (ES)  are  reference  trajectories  which 
are  specifically  designed  to  provide  efficient  closed  loop 
learning.  We  relate  ES  design  to  the  issue  of  input  richness  (or 
persistent  excitation).  Our  ES  represents  a  weaker  criteria  than 
persistent  excitation 

A  theorem  regarding  constructive  sufficient  conditions  for 
asymptotically  stable  closed  loop  learning  is  stated,  and 
examples  of  learning  in  1  and  2  degree  of  freedom  manipulators 
are  given. 


UNTRODUCHQM 

Adaptive  control  of  robot  manipulators  has  been  the  subject 
of  much  research  in  recent  years  ( see  Ref.  (1]).  Adaptation,  in 
control,  is  the  process  of  adjusting  the  controller  to  comply  with 
the  regulation  and  tracking  requirements  of  the  closed  loop 
system.  Direct  adaptive  controllers,  e.g.  [2]  (see  also  Ref, (3]), 
use  tracking  errors  of  the  joint  motion  to  direct  the  robot  model 
parameter  adjustment.  The  direct  adaptive  controllers  are  based 
on  the  full  dynamic  model  of  the  robot.  Learning,  in  this  context 
is  the  identification  of  the  true  values  of  the  manipulator 
parameters  in  closed  loop  operation. 

In  operation,  the  controller  is  given  a  trajectory  by  a  path 
planner  in  order  to  accomplish  some  useful  task.  The  controller 
then  adapts  the  parameters,  on  line,  so  as  to  satisfy  the  tracking 
requirements.  If  the  parameters  are  not  known  exactly,  there 
will  be  a  transient  period  of  tracking  error  while  adaptation 
occurs.  So  that  identification  of  the  true  parameters  is  desirable 
for  increased  tracking  precision. 


Present  robot  adaptive  controllers  (see  for  example  (2),  [4] 
[5])  use  robot  parameter  update  laws  which  rely  on  the 
assumption  that  the  parameters  are  constants.  Also,  to  guarantee 
learning,  a  trajectory  which  is  "rich"  enough  to  expose  all  the 
dynamics  of  the  manipulator  is  required  for  convergence  of  the 
parameter  estimates  to  their  true  values.  Richness  of  input  refen 
to  trajectories  such  that  the  internal  signals  of  the  parameter 
adjustment  mechanism  is  persistently  exciting  (see  (61).  [f  ^ 
trajectory  is  not  persistently  exciting,  stability  is  assured  but  nQj 
learning-  Thus  future  tasks  are  performed  with  the  $a ™ 
transient  tracking  errors. 

We  propose  (see  figure  1.1)  that  whenever  it  is  possible  to 
modify  the  input,  e.g.  prior  to  putting  the  manipulator  to  wort 
doing  useful  tasks  or  periodically  when  parameters  change  and 
the  path  planner  is  not  demanding  a  new  path,  that  there  will  be  a 
learning  period  where  the  controller  learns  the  true  values  of  the 
parameters  by  tracking  artificially  designed  Exploratory 
Schedules  (ES). 

ES  are  trajectories  specifically  designed  for  asymptotic 
learning  of  the  system  dynamics.  Any  trajectory  that  is 
persistently  exciting  could  be  used  as  an  exploratory  schedule. 
However,  the  synthesis  of  persistently  exciting  trajectories  is 
generally  not  straightforward.  In  this  article,  we  show  how  to 
construct  Exploratory  Schedules  which  guarantee  closed  loop 
learning.  Our  ES  are  not  necessarily  persistently  exemng. 


U*«r/P»th  Planner 


figure  1.1:  Block  diagram  of  the  proposed  system. 
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The  rest  of  the  article  is  formatted  as  follows.  Section  2 
presents  the  general  structure  and  properties  of  a  rigid  robot. 
Section  3  defines  the  adaptive  control  structure  used,  and 
presents  sufficient  conditions  for  global  asymptotic  learning  of 
parameters  which  is  not  explicitly  based  on  persistent  excitation. 
Secdon  4  describes  simulation  results  of  learning  with  1  and  2 
degree  of  freedom  (DOF)  manipulators.  Section  5  concludes  the 
article. 

2.  Rigid  Robot  Dvnatnir« 

The  closed  form  dynamics  obtained  by  the  Lagrange-Euler 
formulation  has  the  general  matrix  form  of  [7] 

D(q)q  +  C(q4)q  +  G(q)»t  (2.1) 

where 

D(q)  is  an  n  x  n  matrix  of  inertial  terms, 

C(q,q)  is  an  n  x  n  matrix  of  coriollis  and  centrifugal  terms , 

G(q)  is  an  n  x  1  vector  of  gravitational  terms, 
q  isannxl  vector  of  joint  coordinates, 
t  is  an  n  x  1  vector  of  forces/torques, 
n  is  the  number  of  degrees  of  freedom. 

The  formulation  results  in  n  second-order,  coupled 
differential  equations. 

Renarameterization  Pronertv  of  Robot  Dynamics 

It  has  been  pointed  out  by  a  number  of  authors  (e.g.  [5]), 
that  there  are  properties  of  the  dynamics  that  can  be  exploited  for 
robot  control.  The  reparameterization  property  is  repeated  here, 
as  it  will  be  used  in  other  sections.  The  property  states  that 

X  *  D(q)q  +  C(q,q)q  +  G(q)  »  Y(q,q,q,q)0 

where  Y(q,q,q,q)  is  an  n  x  r  matrix  of  known  functions  and  0  is 
an  r  x  1  vector  of  constant  parameters.  This  property  implies 
that  the  robot  dynamics  may  be  viewed  as  a  linear  operator  from 
a  suitably  chosen  set  of  parameters  to  the  joint  torques. 

3.  Closed  Loon  Learning  Via  Exploratory  Schedules 

In  [2],  an  adaptive  controller  which  guarantees  global 
asymptotic  tracking  was  formulated.  The  controller  is  based  on 
knowledge  of  the  robot  dynamics  structure  and  the  use  of  sliding 
surface  control.  A  major  advantage  of  this  controller  is  that  there 
is  no  need  to  measure  the  joint  acceleration.  The  convergence  of 
the  parameter  estimates  to  their  true  values,  however  is  not 
guaranteed.  In  this  work,  the  same  control  structure  is  used. 
However,  we  synthesize,  whenever  possible,  the  reference  input 
to  guarantee  both  tracking  and  learning. 


Contrnllpr  Struehir» 

For  the  system  of  equation  (2.1).  define  the  virtual  reference 
trajectory 

qr “qd-A Joedt  ,  i, 

qrsqd-Ae 

q‘raq<i-Ae 

where  q^  is  an  n  x  1  vector  of  desired  joint  coordinates.  e=  q-qj, 
and  A  is  a  positive  definite  matrix  with  constant  coefficients. 
Define  the  sliding  surface 

s*er  =  q-qf«  e  +  Ae  (3.2) 

Let  the  control  input  be 

X  »  6(q)qr  +  e(q,q)qr  +  6(q)-Kqs  (3.3) 

-Y(q,q,qr,qf)0-Kds 

where  6(q),  £(q,q),  G(q)  and  Q  are  estimates  of  D(q),  D(q,q), 
D(q)  and  0  respectively,  and  Kj  is  an  n  x  n  positive  definite 
matrix.  The  parameter  adaptation  law  is 

^«SM.Ka-lYT(q,q,qr,qr)s  (3.4) 

where  the  estimation  error  is  defined  as  H  =*  (!)  -  (.),  and  K, 
is  an  r  x  r  positive  definite  matrix  with  constant  elements. 

Rsfiisia  [8] 

If  r  5  n,  and  if 

Rank  j  Y(Md-V^  Lr 
I  lim  t->  » 

then  the  controller  (eq.  (3.3),  (3.4))  guarantees  global 
asymptotic  tracking  and  identification. 

Selection  of  Exploratory  Schedules 
An  implication  of  the  theorem  is  that  if  r  >  n,  a  sufficient 
condition  for  k  Sn  parameters  to  be  identified  is  that  they  are  are 
not  in  the  null  space  of 

Y^VV^ 

lim  t  •>  oo 

Which  implies  that  a  desired  trajectory  may  be  chosen  such  that 
columns  corresponding  to  the  k  £ri  parameters  arc  linearly 
independent,  which  will  guarantee  that  the  parameters  are 
identified.  So  that  different  time  sections  of  the  Exploratory 
Schedule  (which  specifies  q^.q^.q^  )  may  be  designed  to  learn 
different  components  of  the  vector  Q  it  its  dimension  exceeds  n. 
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it_SJMHlATlQN  RESULTS 

Manipulator  with  LEQg 

The  first  simulation  was  done  foe  a  single  link  manipulator 
as  shown  in  figure  4.1  J 


i 


figure  4.1:  Simulated  1  DOF  manipulator. 
The  dynamics  of  the  manipulator  are 
-t-md^q  +  mgd  sin(q). 


(4.1) 


where,  d  *  1,  g  *  10,  and  m  *  2  .  The  matrix  of  known 
functions  is 


Y(  q.qWc^q  +  gdsinfq)], 


(4.2) 


with  m  as  the  single  parameter  to  be  estimated.  The  control  law 
as  specified  by  eq  (3.3)  is 


t  *  [  d2  q, +g  d  sin(q)]m  -  kd(  q  -  4), 


(4.3) 


where  4r  *  4i  *  (q  *  qd>.  kd  *  1,  and  the  parameter  adaptation  law 
as  defined  by  (3.4)  was  used  to  estimate  m,  where  the  estimate 
is  denoted  as  m  . 

In  experiment  la. 


which  is  less  than  r  =  n  =  1.  Notice  in  figure  4.2,  that  even 
though  the  tracking  error  does  converge  to  zero,  the  estimate  of 
m  does  not  converge  to  its  true  value  of  2.0 . 

In  experiment  lb, 

qa^O.O.q.^l.O^RankJ^^)  -l 


which  equals  n.  In  the  second  experiment,  shown  in  figure  4.3, 
the  estimate  of  m  does  converge  to  its  true  value.  Thus  closed 
loop  learning  is  obtained. 


figure  4.2:  Experiment  la. 
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figure  4.3:  Experiment  lb. 


Manipulator  jdlli  2  BflE 

The  second  simulation  was  done  for  a  2  DOF  manipulator  as 
shown  in  figure  4.4 
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figure  4.4:  Simulated  2  DOF  manipulator. 

The  dynamics  of  the  2  DOF  manipulator  arc 

ti=^li^l  +  ®\2§2  +  ^  C  C1]C|2  +Cq22+Gj 

+D22Q2"  ^•<Hl2  +<^2 


(4.4) 


_ U VitfihA- 


where 

Dtl  *  mj  lj2  +  m2  lj2  +  nij  lj2  +  2  lj  lj  cos(q2), 
D12  =>  m2 122  +  mj  lj  I2  cos(q2), 

D22  *  ®2 122 .  _. 

C  *  mj  1, 12  sinCqj)  « 

Gt  =  gfmt  +  mj  )Ij  sinfq^  +  g  mj  I2  sin(qj  +  q2) 
G2  *  ^  m2 12  an(qj  +  q^), 

with  m.  ■  m2  »  10.0,  lt  *  I2  »  1.0,  g  =  9.81 . 

The  parimeten  and  their  true  values  are 


In  experiment  2a, 

qdl  -  0.5.  qd2  =  -0.5  =  )  =1 

which  is  less  than  minimum(r,  n)  =  2.  Notice  in  figure  4.4,  that 
even  though  the  tracking  error  (which  is  not  shown)  does 
converge  to  zero,  the  estimates  of  04  and  05  do  not  converge  to 
the  true  values . 

In  experiment  2b, 


0,  « (fflj+mj)  lt2  -  20.0  04  *  g(mj  +  mj  )lt  *  196.2 

02  *  m2 122  *  10.0  0j  >  g  m2 12  *  98.1 

03  *  m2  lt  I2  ■  10.0 .  (4.5) 

Which  leads  to  the  reparameterization  matrix 


Y(q.4.  q.q) 


III  YI2  Y13  Yt4  Y15 
Y2t  Y22  Y23  Y24  Y25  J* 


(4.6) 


where 

Yll*3i» 

Y21  * 

Yi2"Ya*?i +?2  • 

Y13  ■C0t(q2)(2ql+q2)-an(q2X2qiqy,r  q22), 
Y23  -  cos(q2)q1+  sin(q2)  qi2, 

Yw»sin(q1), 

Y14  *  0, 

Ylj  *  Y25  *  sm(q1+q2). 


The  control  law  as  specified  by  eq.  (3.3)  is 

t»Y(q,q,qr,qr)0.K(iS,  (4.7) 


where 

s  *  (cj+ej,  ©j+ej)7 
Kj  »diag(I000, 500). 

The  parameter  adaptation  law  as  defined  by  (3.4)  was  used  to 
update  the  parameter  estimates  0,  with 
Kt-l=»diag(  1000, 1000). 

The  dimension  of  Y(  q,  q,  q^.q,)  in  the  2  DOF  case  is  2  x  5 
so  that  r  >n,  and  at  most  2  parameters  can  be  guaranteed  to  be 
'.earned  simultaneously. 

The  initial  conditions  for  both  experiments  (2a  and  2b)  are  at 
qi(0)=  q2(0)=  4i(0)=  q2(0)=  0.  The  desired  velocities  for  both 
experiments  is  4ii(0)=  4^(0)=  0. 


<1,,  .  0.5.  0.5  =  Ranl^V;{)  -2 

which  equals  n.  The  linearly  independent  columns  for 
experiment  2b  correspond  to  parameters  04  and  05 .  As  shown 
in  figure  4.5,  the  estimates  of  04  and  0j  do  converge  to  the  true 
values. 


time  (see) 

figure  4.5-  Experiment  2b. 
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Frpinratnrv  crhprtnle  for  the  2  DOE  Manipulator 
Selection  of  the  exploratory  schedule  for  the  2  DO? 
manipulator  was  accomplished  by  noting  that  columns  4  and  5 
of  (4.7)  are  functions  only  of  position,  therefore  selecting 
and  q<jt,  such  that  columns  4  and  5 
are  nonzero  as  C|,  ei,  q,  q  •>  0  will  ensure  that  parameters  04 
and  05  are  identified.  .Vext,  selecting  q<ji=qd2=0-0.  and  q^, 
qd2,  qqi,  qd2  such  that  column  3  is  nonzero  in  the  limit  ensures 
identification  of  parameter  03.  Then  selecting  q^,  q^  such  that 
columns  1  and  2  are  nonzero  ensures  the  identification  of 
parameters  04  and  05  .  The  actual  exploratory  schedule  is  as 
shown  in  figures  4.6  to  4.8  . 


figure  4.6:  Desired  joint  positions  during  ES. 


figure  4.7:  Desired  joint  velocities  during  ES. 


figure  4.8:  Desired  joint  acceleranons  during  ES. 


Theoretically,  a  parameter  is  guaranteed  to  be  identified 
when 

e  1  =C2:=«l  =«2=  0.0. 

However,  in  practice,  due  to  noise  and  disturbances,  exact 
tracking  may  not  occur  and  small  tracking  errors  may  lead  to 
small  errors  in  the  parameter  identification.  In  this  simulauon.  a 
parameter  is  said  to  be  identified  when 

Maximum(leil,  le^  ,lql,  Ie20  £  e^i, 

where  et0|  =  0.01,  and  a  further  stipulation  that  attempted 
identification  of  each  parameter  is  to  last  at  least  3  seconds. 

The  time  period  corresponding  to  identification  of  04  and  05 
is  0  £  t  <  4.2.  As  can  be  seen  in  figures  4.9  and  4.1 1,  the 
parameter  estimation  error  for  04  and  05  approaches  zero  as  all 
joint  errors  fall  below  0.01  at  t=4.2. 

The  time  period  corresponding  to  identification  of  03  is 

4.2  S  t  <  7.3.  The  parameter  estimation  error  for  03  approaches 
zero  as  all  joint  errors  approach  zero  at  t=7.3 . 

Parameters  0j  and  0j  arc  identified  during  the  time  period 

7.3  <  t  £  10.3  . 
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figure  4.1 1:  Estimation  error  for  parameters  1,2  and  3. 


figure  4.12:  Estimation  error  for  parameters  4  and  5. 


S.  Conclusion 

In  this  article,  we  explored  the  relationship  of  learning  and 
adaptation  in  robot  control.  We  have  shown  how  the  design  of 
ES  is  an  essential  aspect  of  learning.  We  have  related  ES  design 
to  the  issue  of  persistent  excitation.  The  ES  represents  a  weaker 
criteria  than  persistent  excitation 

A  theorem  regarding  constructive  sufficient  conditions  for 
asymptotically  stable  closed  loop  learning  was  stated,  and 
examples  of  learning  in  1  and  2  DOF  manipulators  were  given. 

The  simulation  results  of  the  ES  for  a  2  DOF  manipulator 
(section  4.2),  showed  how  in  practice,  due  to  noise  and 
disturbances,  exact  tracking  may  not  occur  and  small  tracking 
errors  may  lead  to  small  errors  in  the  parameter  identification. 

We  have  demonstrated  a  method  for  the  selection  of 
exploratory  schedules  for  rigid  robot  manipulators  employing 
inverse  dynamics  controllers  with  direct  parameter  adaptation. 
A  natural  extension  to  this  work  is  to  generalize  the  concept  to  a 
broader  class  of  systems. 


We  have  specified  the  ES  as  a  desired  trajectory  that  is  to  be 
followed  to  do  learning  while  the  manipulator  is  not  doing  other 
useful  tasks.  However,  a  more  appealing  idea  would  be  to 
integrate  the  exploratory  schedule  into  any  trajectory  that  is 
specified  by  the  path  planner.  The  ES  would  take  the  torm  of  an 
exploratory  (or  probing)  signal  into  the  desired  trajectory  to 
cause  asymptotic  learning.  The  injection  of  exploratory 
schedules  in  this  manner  would  be  similar  to  the  concept  of  dual 
control  (see  Ref.(9))  in  that  the  probing  signal  ES  is  combined 
with  the  so  called  cautious  signal. 
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ABSTRACT 

An  unsupervised  parallel  distributed  adaptive  controller  for 
nonlinear  systems  is  proposed.  It  is  shown  to  provide  bounded 
tracking  and  asymptotic  regulation  following  an  arbitrary 
‘teacher’.  A  two  degrees  of  freedom  robotic  simulation  example 
is  provided. 


1.  PROBLEM  FORMULATION 
An  unsupervised  distributed  parallel  computing  architecture 
is  proposed  for  the  adaptive  control  of  nonlinear  dynamic 
systems  of  the  class 

x  (t)  *  A(x)x(t)  +  B(x)u(t)  (1) 

y(t)  *  C(x)x(t)  +  D(x)u(t)  (2) 

where  x(t)e  Rn  is  the  plant  state  vector.  y(t)e  Rm  is  the  output 

vector,  and  u(t)e  Rm  is  the  input  command  vector,  and  where 
A(x),  B(x),  C(x),  and  D(x)  are  uniformly  bounded  matrices  of 
corresponding  dimensions. 

The  proposed  controller  is  depicted  in  Figure  1  below:  It 
consists  of  a  teacher  model  which  contains  apriori  knowledge 
regarding  the  desired  input/output  plant  response  as  well 
as  the  repertoir  of  reference  commands  that  the  system  may  be 
subjected  to. 


Fig.  1:  Proposed  Adaptive  Controller 

The  teacher  dynamic  model  is  assumed  to  have  the  following 
representation: 


This  paper  is  based  upon  research  supported  in  cart  by  Drexel 
University's  Stein  Fellowship  Foundation  and  by  AFQSR  Grant 
No.  890010. 


x, (t)  *  At(x^x,(t)  +  B,(x,)ut(t) 

y, (t)-Q(x1)x1(t)  +  D,(xt)ut(t)  (4) 

where  x,(t)e  Rn*  is  the  state  vector,  y,(t)c  Rm  is  the  output 

vector,  and  ut(t)6  Rm  is  the  input  command  vector,  and  where 
A,(x,),  Bt(Xt),  Q(Xt),  and  D,(x,)  are  uniformly  bounded  matrices 
of  corresponding  dimensions.  It  is  emphasized  that  the 
dimension  of  the  model  is  unrestricted,  except  that  dim(y,)  * 

dim(y) »  m.  „  , 

The  parallel  distributed  adaptive  controller  has  structure 
similar  to  the  LMS  adaptive  layer  [Widrow,  85],  and  to  many 
other  neurocontroller  architectures  [Guez,  88).  It  receives  the 
input  'features’  vector  f(ut,  xt,  y)  and  generates  as  an  output  the 
process  contorl  vector  u,  where 

u(t) «  K(t)f(Ut,  x„  y)  (5) 

where  K(t)  is  the  adaptive  gain  matrix  of  appropriate 
dimension.  Each  Ky  gain  monitors  the  sensitivity  of  the  i-th 
control  loop,  namely  uj,  to  the  j-th  featu-:  of  the  system, 
namely  fj(ut,  x,.  y).  The  Ky  gain  adjusts  its  value 
independently  of,  and  simultaneosly  with  all  other  gains, 
according  to: 


Kj,(t) »  Mjj(t)  +  Njj(t) 

(6) 

Mij(t)-aij(y,.-yi)fj 

(7) 

|j'Njj(t)»-pijNij(t)+Yij(yH-yi)fj 

(8) 

where  oty,  Py  and  Yjj  are  positive  constants.  We  emphasize  that 
the  adaptation  law  (6)-(8)  is  performed  in  parallel  and  in  a 
distributed  fashion,  that  is,  the  K|j  gain  only  needs  data 
from  the  j-th  feature  and  i-th  output  components. 
Thus,  very  large  scale  dynamic  systems  may  be  considered  for 
this  controller.  The  adaptive  gains  consist  of  two  terms:  a 
"proportional"  term,  M^t),  and  an  "integral"  term.  N,,(t>. 
Notice  that  the  role  of  the  teacher  (3)-(4)  is  to  demonstrate  to 
the  adaptive  controller  what  should  be  the  appropriate  and 
desired  response  yt  for  any  specified  reference  input  ut.  Since 
the  teacher’s  model  is  incorporated  in  the  controller  structure 
(Figure  1),  it  yields  the  so-called  ’unsupervised’  learning  or 
adaptation. 

For  the  above  desenbed  control  architecture  we  use  recent 
results  by  (Bar-Kana,  87,  89b)  to  show  that,  under  some 
realistic  assumptions,  large  classes  of  nonlinear  plants  of  the 
form  (l)-(2)  with  adaptive  controllers  of  the  form  (5)-(8i  van 
perform  good  trajectory  tracking  and  also  guarantee  robust 
adaptive  stabilization  in  the  presence  of  any  bounded  input 
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commands  and  input  or  output  disturbances.  Furthermore,  this 
adaptive  controller  has  ben  shown  to  have  good  graceful 
jeeradation  ( or  fault  resistant )  properties  with  control  failure 
or  fast  changes  of  plant  parameters  (Morse  and  Ossman.  1989). 

MAIN  RESULTS 

We  describe  below  a  summary  of  our  main  results 
regarding  the  performance  of  the  controller  in  (6),  (7),  and  (8). 
We  will  assume  that  if  the  plant  is  stabilizable  the  resulting 
stable  configuration  is  "exponentially  stable"  as  defined 
below: 

DEFINITION  1  (Hahn,  1967]:  Let  the  general  nonlinear 
system  be  represented  by  the  n  th-order  vectorial  equation 

x  (t)  *  f(x.t)  and  let  x=*0  be  an  equilibrium  point  The 
equilibrium  point  is  called  "exponentially  stable"  if  all 

solutions  x(t)  satisfy  the  relation  lx(t)l  £  a  lx(0)l  e'?t  for  some 
scalars  a  >  0  and  p  >  0. 

THEOREM  1  (Hahn,  1967]:  Let  the  right  hand  of  the 

equation  x  (t)  *  f(x,t)  have  bounded  continuous  first  order 
partial  derivatives.  Let  the  equilibrium  be  exponentially 
stable.  Then  there  exists  a  Lyapunov  function  V(x,t)  which 
satisfies  estimates  of  the  form 


1.1 

1.2 

1.3 


l  lx(t)l2  S  V(x,  t)  S  o2  lx(t)l2 
x,  t)  £  -  a3  lx(t)l2 

£  o<  lx(t)l2  ,  i«l,2 . 


for  ct|,  a2,  a3,  and  a4  positive  constants. 

Because  we  deal. with  nonlinear  systems  we  cannot  expect 
to  be  able  to  find  or  even  show,  like  in  linear  time-invariant 
systems,  existence  of  positive  definite  quadratic  Lyapunov 
function  of  the  form  V(x)  -  xT(t)Px(t)  where  P  is  constant  and 
positive  definite.  However,  after  some  experience  with  specific 
nonlinear  systems  like  robots,  and  because  we  restrict  our 
discussion  to  nonlinear  systems  linear  in  control  of  the  form 
(1)*(2),  we  assume  that  exponential  stability  of  the 
autonomous  system  (1),  with  u(t)  *  0,  implies  existence  of 
Lyapunov  functions  V(x)  which  are  not  explicit  functions  of 
time.  Also,  since  we  can  always  write  V(x)  •  xT(t)P(x)x(t) 

and  then  M.x)  =  xT(t)(lfx)  +  P(x)A(x)+AT(x)P(x)]x(t) 

or  *x)  *  •  xT(t)Q(x)x(t),  we  will  use  in  the  subsequent 
discution  the  following  assumption: 

ASSUMPTION  1:  Exponential  stability  of  the 
autonomous  system  (1)  .with  u(t)  *  0,  implies  existence  of 
Lyapunov  functions  of  the  form  V(x)  =  xT(t)P(x)x(t)  and 


derivative  of  the  form  Yx)  =  -  xT(t)Q(x)x(t),  where  P(x)  and 
Q(x)  are  positive  definite  for  all  x  e  Rn. 

After  establishing  the  basic  definitions  and  facts,  we  can 
start  presenting  the  properties  of  the  adaptive  controller.  The 
following  result  shows  the  stabilizing  properties  of  the  main 
part  of  the  adaptive  controller  that  wc  propose. 

RESULT  1:  Let  us  assume  that  the  plant 


x  (t)  =  A(x)x(t)  +  B(x)u(t)  (9) 

y(t)  =  C(x)x(t)  (10) 

can  be  stabilized  by  some  constant  output  feedback,  Kv.  In 
other  words  the  fictitious  closed-loop  system*  is 
exponentially  stable  according  to  assumption  1.  Let  us 


define  y4(t)  =  y(t)  +  Ky  u(t).  and  use  the  adaptive  algorithm 
u(t)  -  -  K(t)y.(t)  til) 

with  the  integral  adaptive  gain  K(t)  =  N(t)  given  by 


N(t)  =  y,u)y[(t)r  ( 121 

(where  T  is  a  selected  posiuve  definite  scaling  matrix)  or  (for 
each  scalar  gain) 

^Nij=y1Jyt.ylj  (13) 

The  simple  algorithm  (1 1)-(13)  guarantees  boundedness  of 
all  values  involved  in  the  adaptation  process,  namely,  states, 
outputs,  and  adaptive  gains,  and  asymptotically  perfect 
regulation  for  the  augmented  system  (Fig!  2) 


Fig.  2.  The  closed-loop  control  system. 

x(t)  *  A(x)x(t)  +  B(x)u(t)  (14) 

y(t)  =  C(x)x(t)  (15) 

y»(0  =  y(t)  +  Kylu(t)  =  C(x)x(t)  +  Ky'u(t)  (16) 

such  that  y(t)  — >  0  and  yt(t)  -»  0  as  t  -»  <*>.  The  gains 
ultimately  reach  some  constant  values  which  allow  perfect 
regu!ation.(Appendix  A). 

We  want  to  use  the  stabilizing  nonlinear  algorithm  <12).  in 
combination  with  other  adaptive  terms  in  order  to  implement  a 
stable  trajectory-following  adaptive  algorithm.  Let  the  teacner 
generate  the  desired  trajectories  that  the  plant  must  follow  Let 

f(u„  xt,  y)  =  [  (y,  -  y)T,  Xy.  u[  ]T  (17) 

where  the  feature  vector  f(ut,  xt.  y)  uses  all  values  that  can  be 
measured,  like  the  input  commands,  the  teacher's  states,  and 
the  tracking  errors.  We  could  try  to  use  the  adaptive  algorithm 


Kjj(t)  =  Mjj(t)  +  Njj(t) 

(18) 

M^O-a^-y,  )fj 

(19) 

^-NiJ(0  =  Y,J(yl,-yi  )fj 

(20) 
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applicable  in^realistic  environments.  Ills  clear  that  the  perfect 
integrator  in  (20)  increases  without  bound  whenever  perfect 
Peking  is  not  possible.  Since  the  nonlinear  system  includes 
uncertainties  that  we  do  not  assume  to  know  or  identify,  and 
since  moreover,  input  and  output  disturbances  may  usually  be 
present,  we  do  not  start  with  the  proof  of  perfect  tracking, 
which  could  be  obtained  in  some  ideal  situation,  but  concentrate 
on  the  proof  of  robust  stability  under  nonideal  conditions.  By 
"robust  stability"  we  mean  boundness  of  all  values  involved  in 
the  adaptation  process  like  states,  adaptive  gains  and  errors, 
and  tracking  with  arbitrarily  small  tracking  errors.  Perfect 
tracking  in  idealistic  conditions  is  a  particular  case  of  the 
general  robust  tracking  under  nonideal  conditions.  The  only 
change  needed  to  guarantee  robustness  of  the  adaptive  system 

is  the  addition  of  the  by-pass  term  -  puNuft)  to  the  right  term  of 
(20).  and  get  the  complete  algorithm  (5)-(8). 

RESULT  2  :  Under  the  assumptions  of  result  1,  the 
adaptive  algorithm  (5)-(8)  guarantees  robust  stability  of  the 
system  (14)-(16)  [Guez  and  Bar-Kana,  1989]. 


x,  »  x2 

x2  =  D‘l[u,  +  x2x4(B,  -  Bj)sin(2xj)]  +  d^t) 
x  3  = 

x4  =  (u2  -  (x£/2)(B,  -  B3)sin(2x3)]/B2  * d.,(t) 


D  =  A|+  mnL)  +  Btcos*x3  +  B3sin*x3, 

L(  =  0.1  m  ,  At  =  0.01  kgm\  mB  =  0.6  kg, 

B)  =  0.06  kgm2,  B2  =  0.01  kgm2,  B3  =  0.05  kgm:. 

The  output  measurements  are 


yi  =  xi  +  do,<0 

y2  =  x2  +  d02(t) 


Let  us  assume  that  the  plant  needs  some  general  dynamic 
configuration  to  reach  stability.  Specifically,  let  H:  (Af,  Bf„ 
Cf„  Of.]  be  some  LTI  dynamic  feedback  controller  that 
guarantees  that  the  closed  loop  system  is  exponentially  stable 
according  to  assumption  1.  Then  we  can  state  the  following 
result: 

RESULT  3  s  Let  G(*)  be  some  nonlinear  system  of  the 
form  (9)-(10)  and  let  H:  (Af,  Bf„  Cf„  Dfl)  be  a  stabilizing 
controller  under  assumption  1.  Then,  the  adaptive  algorithm 
(5)-(8)  guarantees  robust  stability  of  the  augmented  system 
G»(')  *  G(*)  +  IT1  [Guez  and  Bar-Kana,  1989]. 

More  important,  when  improper  linear  controllers  H  are 
needed  to  stabilize  the  nonlinear  plant,  we  can  use  their  proper 
inverse  in  parallel  with  the  plant  This  way,  we  only  use  the 
knowledge  on  the  existence  of  an  improper  controller  and 
actually  use  a  proper  configuration  in  parallel  with  our  plant 
Specifically,  as  in  the  case  of  nonlinear  robotic  manipulators, 
PD  controllers  of  the  form  H(s)  =  K(l+qs)  can  stabilize  the 
manipulators,  and  if  K  is  very  large,  we  can  get  very  good 
tracking  in  ideal  situation.  However,  in  practice  we  do  not  want 
to  use  differentiators  or  high  gains  in  control  loops.  In  our 
approach  we  only  use  the  knowledge  on  their  mere  existence  to 

implement  simple  first  order  poles  of  the  ferrr.  H-1(s)=  y~  in 

parallel  with  the  plant,  where  D=  K'1  can  be  a  very  small  gain. 

Notice  that  we  do  not  guarantee  any  more  that  the  plant  is 
perfectly  tracking,  because  the  best  we  can  obtain  is 

y,(t)  =  y(t)  +  Du(t)  -»  y,(t),  although  we  actually  want 
y(t)  -*  y,(t).  Still,  if  the  maximal  admissible  gain  is  large 
compared  with  the  gain  of  the  plant,  then 
y,(t)  =  y(t)  +  T)u(t)  *  y(t).  For  a  quite  common  example, 
assume  that  the  gain  of  the  plant  is  10  and  the  maximal 
stabilizing  gain  K,,,^  =  100.  Then  we  use  D  =  1/ 100  =  0.01  in 
parallel  with  10  and,  as  the  references  show.  y4tt)  =  ytt)  for  all 
practical  purposes. 

ROBOTIC  EXAMPLE 

The  proposed  controller  (5)-(8)  was  applied  to  the  nonlinear 
robotic  system  (Desa  and  Roth,  1985]: 


where  d;  and  do  are  the  input  and  output  disturbances.  The 
teacher  was  given  by  the  simple  decoupled  model: 


and  it  was  tested  with  demanding  square-wave  input 
commands. 

In  paralel  with  the  plant  (21)-(24)  we  employ  the 
supplementary  feedforward  [Bar-Kana,  87,  89a] 


Supplementary  disturbances  were  added  to  check  robustness 


r  5  i 

r  0,1  1 

<V*>-  ; 

do(0  = 

(30) 

Ll0sin(100tH 

L-0.2  sin(120t)- 

and  the  adaptation  coefficients  were 


a,j  =  Y.j=I(X).;  Pi3  =  0.1  (31) 

Results  of  simulations  are  shown  in  Figure  3.  Figure  3a 
compares  the  first  plant  and  model  output,  and  figure  3b  'rows 
the  second  output.  Figure  3c  shows,  for  ilustration,  the 
behavior  of  the  adaptive  gain  K32(t),  It  can  be  seen  how  the 
adaptive  gain  moves  up-and-down  in  order  to  maintain  small 
tracking  errors.  Figure  3d  represents  the  norm  of  all  plant 
states,  to  show  that  no  hidden  state  diverges.  The  input 
disturbances  were  introduced  from  the  start  and  their  effect  is 
hardly  felt  at  the  output.  The  output  disturbances  were  then 
introduced  at  tj  =  0.16sec  and  t>  =  0.23  s.  All  values  remained 
hounded  while  the  plant  tracked  with  small  errors,  although  we 
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CONCLUSIONS 

This  paper  presents  a  parallel  distributed  computing  architecture 
for  adaptive  control.  Starting  with  some  prior  assumptions 
about  stabilizability  of  the  plants  «  results  tn  a 
unsupervized  architecture.  The  feasibility  of  the  method  is 
demonstrated  on  an  example  of  robotic  manipulator. 

APPENDIX  A  •  PROOF  OF  RESULT  1 
The  control  system  with  the  adaptive  comroler  ( 18)-(20)  is 


x  (t)  =■  A(x)x(t)  -  B(x)N(t)y(t) 

=A(x)x(t)  -B(x)KeC(x)x(t)  +B(x)KeC(x)x(t)  -B(x)N(t)C(x)x(t) 
*[A(x)  •  B(x)K,C(x)]x(t)  -B(x)(N(t)-KJC(x)x(t) 

=A,(x)x(t)  -B(x)[N(t)-KjC{x)x(t)  (A.l) 


y(t)*C(x)x(t) 


(A.2) 


We  must  prove  stability  of  all  the  values  involved  in  the 
adaptation  process,  like  states,  outputs,  gains.  Let  us  select  the 
quadratic  Lyapunov  function 

V(t)  =>  xT(t)P(x)x(t)  +  tr[(N(t)  -  Ke)r‘(N(t)  -  K*)T)  (A.3) 

where  tr  denotes  trace.  Notice  that  the  second  term  in  (A.3)  is 
only  a  short  notation  for  the  sum  of  all  terms  of  the  form 

(Njj(t)  -  Kjj]  fyi}.  Therefore,  (A.3)  is  a  positive  definite  quadratic 
function  of  all  states  and  adaptive  gains.  The  derivative  of  the 
Lyapunov  function  is 


V(0  *  xT(t)f>(t)x(t)  +  xT(t)P(x)  x(t)  +  xT(t)P(x)x(t) 

+  tr[(N(t)  -  Kt)r*NT(t)]+tr[N(t)r1(N(t)  -  k/] 

-xT(t)[f»(t)  +  P(x)A,(x)  +  A^(x)P(x)]x(t) 

•  xT  (t)P(x)B(x)(N(t)-KJC(x)x(t) 

-  xT(t)CT(x)[N(t>-KJTBT(x)P(x)x(t) 

+  trf(N(t)-igr1ry(t)yT(t)] 

+  trfy(t)yT(t)rr1(N(t)  -  K/]  (A.4) 

By  using  (14)-(15)  we  get 

V(t)  =  -  xT(t)Q(x)x(t)  -  2xT(t)CT(x)[N(t)-KJC(x)x(t) 
+2xT(t)CT(x)[N(t)-Ke]C(x)x(t)  =  -xT(t)Q(x)x(t)  S  0(A.5) 


already  that  the  gains  are  bounded.  Second,  x(t)  ■  0  implies  y(t) 

a  0,  which  implies  ft  (t)  a  0  and  then  N(t)  =  constant. 

Therefore,  N(t)  ultimately  reaches  some  constant  value  such  that 

the  plant  is  perfect  regulator,  as  the  theorem  ctaims. 
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Notice  that  because  \{t)  in  (A. 5)  is  not  a  function  of  the 
adaptive  gains,  we  only  claim  in  (A.5)  that  the  derivative  of  the 

Lyapunov  function  is  positive  semidefintte.  namely  V  (t)  £  0. 
The  selected  positive  definite  quadratic  form  of  V(t)  then 
guarantees  that  all  states  and  adaptive  gams  are  bounded.  The 
trajectories  of  the  adaptive  control  system  finally  reach  the  region 

definedby  V(t)  s  0  (LaSalle,  1981). 

It  is  easy  to  see  that  V(t)  s  0  implies  tin  =  0.  Therefore,  if 
we  ignore  the  adaptive  gains,  the  plant  is  asymptotically  perfect 
regulator,  but  what  about  the  gains?  First  of  all.  we  proved 
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Ahtfracf 

This  snide  repons  on  results  of  experiments  in  object 
recognition  with  a  combined  neural  network  /  expert  system 
architecture  (Neuro-Expen).  The  Neuro-Expen  architecture  is 
outlined  with  a  description  of  the  experimental  object  recognition 
system.  Results  are  reponed  for  the  recognition  of  a  20  pattern 
prototype  set  of  synthesized  binary  images  placed  at  arbitrary 
rotations.  A  100%  recognition  rate  was  obtained  under 
noiseless  conditions.  Addition  of  1%  and  2%  random  pixel 
noise  resulted  in  recognition  rates  of  95.2%  and  89.5% 
respectively. 
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A  NEURAL  NETWORK  APPROACH  TO  TARGET  RECOGNITION 
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ABSTRACT 

In  this  paper,  an  approach  for  distortion  invariant  target 
identification  and  target  classification  based  on  the  Adaptive 
Resonance  Theory-II  (ART-0)  is  presented.  The  neural  network 
employed  demonstrates  fast  unsupervised  learning  coupled  with 
stable  retention  and  retrieval  of  information.  To  keep  the 
dimensions  of  the  network  to  a  bare  minimum  and  to  speed  up 
the  computational  process  as  well  as  to  achieve  invariance,  six 
distortion  invariant  features  an  extracted  from  each  target  image 
and  are  used  as  network  inputs.  These  continuous  valued 
features  are  derived  from  the  geometrical  moments  of  the  image. 
The  ART  based  target  recognition  system,  (ATRS)  can  be  used 
in  two  different  modes  a)  as  a  target  classifier  and  b)  as  a  target 
recognition  system.  Several  parameters  associated  with  the 
network  itself,  allow  for  greater  flexibility  in  feature 
manipulation.  The  performance  of  the  ATRS  is  compared  with  a 
similar  system  employing  the  Multi  Layer  Perception,  (MLP) 
with  the  Back  Error  Propagation  learning  algorithm.  The  ATRS 
is  found  to  perform  better  on  three  major  counts:  Speed  of 
processing,  flexibility  in  feature  manipulation  and  noise 
tolerance.  Determination  of  critical  settings  of  the  various 
parameters  associated  with  the  network  is  crucial  in  tuning  the 
ATRS  to  ensure  stable  and  consistent  performance. 

i.  imopucrm 

The  problem  of  recognition  and  image  interpretation  is  an  old 
one.  It  involves  two  major  components:  a)  preliminary 
processing,  involving  extrapolation  of  information  from  given 
sensory  data  leading  to  an  intrinsic  image  and  b)  knowledge 
based  interpretation  of  the  intrinsic  image  yielding  information 
about  its  contents(ll][12].  What  is  new  is  the  deployment  of 
varied  associative  neural  network  models  that  attempt  to  solve 
the  problems  associated  with  area  b)  mentioned  above. 


This  research  was  partially  supported  by  a  grant  from  the  Air 
Force,  Grant  No.  AFOSR  890010. 


Associative  neural  networks  operate  by  correlating  input  data 
with  internally  stored  templates  to  determine  best  match.  This 
correlation  is  performed  almost  instantly  with  essentially  no 
searching  or  sequential  testing  and  is  in  the  form  of  an 
association  between  the  vast  array  of  information  that  describes 
the  current  situation  and  its  approximate  context.  Such  networks 
process  more  information  and  exhibit  a  higher  degree  of  fault 
tolerance. 

Most  associative  neural  network  models  that  are  employed  in 
feature  extraction  and  object  recognition  involve  an  explicit 
learning  or  training  session.  During  training  they  have  to  be 
specifically  familiarized  with  the  objects  in  the  problem  domain 
they  are  expected  to  identify.  This  training  procedure  is  usually 
lengthy  and  restricts  the  process  of  recognition  to  only  those 
features  that  have  been  learnt  by  the  network.  To  circumvent 
these  drawbacks,  we  have  considered  a  neural  network 
architecture  based  on  the  Adaptive  Resonance  Theory-II  to  solve 
the  problem  of  on  line  target  recognition.  We  have  shown  in 
this  paper  that  this  network  gives  an  improved  performance  that 
reflects  significantly  upon  the  processing  power  and  speed  of 
computation. 

2.  PROBLEM  STATEMENT 

Fast  image  identification  and  interpretation  is  crucial  for  any 
intelligent,  real  time,  target  recognition  system.  Accuracy  of 
target  identification,  noise  tolerance  and  the  capacity  to  handle 
image  distortions  are  aspects  that  play  an  equally  important  role 
in  their  design.  We  have  proposed  an  implementable,  yet 
simplistic  scheme  for  target  classification  and  target  recognition 
in  the  form  of  the  ATRS,  that  has  associated  with  it,  some  of 
the  attributes  mentioned  above.  The  ATRS  treats  a  target  as  a 
collection  of  sensory  data  describing  the  object  represented  in  the 
form  of  an  image  or  simpiy  a  scanner  photograph  of  the  same, 
detached  from  its  background.  Given  a  set  of  target  images  the 
recognition  system,  as  a  target  classifier,  is  expected  to  classify 
the  target  images  into  appropriate  target  categories  without  any 
external  supervision  or  prior  learning.  No  restrictions  are  placed 
on  scaling,  translation  and  orientation  of  the  target  image  and  the 
amount  of  admissible  noise. 


In  the  recognition  mode,  the  set  of  all  possible  prototype  target 
images  encountered  in  the  problem  domain,  called  the  prototype 
set,  is  presented  to  the  ATRS  prior  to  tiny  testing.  This 
supervised  mode  of  operation  enables  us  to  associate  with  each 
prototype  category  estaMgsbed.  a  target  name  or  identification 
number.  Learning  of  d*gro«ype  features  is  expected  to  take 
place  on  a  single  presentation  of  the  prototype  set.  Once  the 
prototype  categories  are  established,  the  ATRS  is  called  upon  to 
correctly  identify  any  given  target  image  that  may  be  a  distorted 
or  noise  prone  version  of  the  prototype. 


3.  IMPLEMENTATION  OF  THE  ATRS 

In  this  section  we  present  our  approach  to  the  solution  of  the 
problem  stated  in  section  2.  Figure  1.  is  a  general  schematic 
diagram  for  the  An  based  Target  Recognition  System : 


Figure  1.  Schematic  of  the  ATRS. 


Functianing-QLtht  ATRS : 

When  used  in  the  target  classificanon  mode,  the  ATRS  is 
presented  with  a  stream  of  differently  oriented,  scaled  and 
translated  binary  images  of  targets.  These  images  are  quickly 
encoded  by  the  systems  moment  generating  preprocessor  and  are 
passed  down  to  the  ART-II  classifier  that  instantly  classifies  the 
images  into  appropriate  target  categories.  The  output  of  the 
ATRS  is  a  category  structure  that  desenbes  which  of  the  input 
target  images  belong  to  which  category  and  the  number  of 
different  target  categories  established  thereof.  In  the  recognition 
mode,  the  ATRS  is  first  presented  with  the  prototypes  before  it 
is  called  upon  to  identify  an  arbitrary  target  image.  It  should  be 
noted  that  the  ATRS  is  sensitive  to  novelty.  If  a  parncular  target 

imacr*  hfle  nor  h^for*  Hnnno  th*  nrnrofvn^ 
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presentations,  a  new  target  category  is  established.The  output  of 
the  ATRS  in  this  mode  is  the  identification  name  or  identification 
number  of  the  input  target  image.  The  various  components  of  the 
ATRS  are  described  in  the  following  sections: 


Th#  moment  reneratint  nreproctaaor ; 


One  of  the  primary  issues  involved  with  the  problem  of  image 
interpretation  is  that  of  coding  information  contained  in  the 
intrinsic  image.  Coding  reflects  directly  upon  the  size  of  dtt 
network,  its  storage  capacity  and  the  speed  of  computation.  We 
have  icjtrtcted  ourselves  to  the  use  of  features  of  the  intrinsic 
target  image  instead  of  the  image  per  sc.  The  moment  generating 
preprocessor  takes  a  two  dimensional  MX  M  image  of  the 
target  arid  encodes  it  into  a  compact  feature  vector  comprising  of 
six  components.  This  feature  vector  is  derived  using  a  set  of 
nonlinear  moment  invariant  functions  defined  on  the  geometrical 
moments  of  the  target  image(l][4](13].  These  features  are 
translation,  orientation  and  scale  invariant  They  were  introduced 
by  Hu  and  were  later  applied  in  aircraft  identification  by  Dudani 
etal.[3)[5].  The  moment  generating  preprocessor  used  in  the 
ARTS  is  different  from  the  one  used  by  Stephen  Grossberg  and 
Gail  Carpenter  (91,  that  employs  the  Fourier  Mellon  invariant 
filter  to  achieve  2-D  spatial  invariant  image  coding.  However,  in 
either  case,  the  target  image  has  to  be  isolated  from  the  image 
background  before  it  can  be  input  to  the  moment  generating 
preprocessor.  On  the  other  hand,  this  problem  can  be 
circumvented  by  determining  critical  settings  of  the  vigilance 
and  noise  tolerance  parameters  associated  with  the  network 
itself. 


Given  a  two  dimensional  M  X  M  image  g(x.y),  (g(*.y),x.y  * 
0,1 . M-U  the  (p+q)th  geometrical  moment  is  defined  by: 

XX  x'Wx.y)  p.q  =  0,1,2... 

a*0 

Scale  invariance  is  achieved  if  the  M  X  M  target  image  is 
mapped  on  to  a  square  defined  by  X  and  Y  belonging  to  {-1,+lJ. 
The  grid  locations  will  no  longer  be  integers  but  will  have  real 
values  in  the  range  [-1.+1). 

m„=  XX  ’tVgOt-y)  P-q  =  0.1.2... 

To  make  the  moments  translanon  invariant,  the  central  moments 
are  defined  as: 

li-=*  XX  -YlV.y)  p.q  =  0.1.2.. . 


and  y  = 


moi 

'ttoo 


Scale  invariance  can  also  be  achieved  by  taking  into  account  the 


central  moments  with: 


G^-Band 

H<n 


+  l 


A  set  of  nonlinear  functions  defined  on  flpq  that  are  invariant 
to  rotation  translation  and  scale  changes  have  been  derived  in 
[l].The  first  six  functions  are: 

/t  =  ^20+^(8 
/2-(Q  arfl«)J+4fltt^ 

/j-fQja- 3QU)J+  (3Qjj-no)2 
/^(Ojo+QiiiV  <<vnoj)2 
/j-  ( %3Qa)(  Qa)  ( ( flrf  nQ)2-3(  Qa+  Q  jV 
(3  QjfQaK  na+  oa)  1 3(  na)2-(  na+ 

/*■  ( %  n«)  l  ( Q#  n0)J*(  <V  + 

4flu(  q)!  fljj*-  na) 


P2mf 

LTM 

FjSrm 


The  feature  vector  representing  the  target  image  is  given  by: 
Log|Gj|,  i=l . ,6 

The  ART.n  classifier : 

A  neural  network  based  on  the  Adaptive  Resonance  Theory-11  is 
adopted  as  the  primary  classifier  in  the  ATRS.  The  Adaptive 
Resonance  Theory-II  is  a  neural  network  architecture  that  is 
capable  of  learning  recognition  categories.  Recognition 
categories  are  described  as  subsets  of  environmental  input 
patterns,  continuous  valued  in  the  case  of  ART-n,  that  share 
invariant  properties.  The  network  extracts  and  learns  to 
recognize  these  invariants  to  enable  self  organization  and  self 
stabilization  of  its  recognition  codes  in  response  to  an  arbitrary 
sequence  patterns.  Recognition  codes  are  represented  as  stable 
states  and  their  spontaneous  emergence  through  the  systems 
interaction  with  the  environment  is  described  as  self 
organization[6][7][8]. 

The  network  used  comprises  of  six  input  channels  at  the  STM 
stage  designated  as  Fj.  These  input  channels  are  directly  linked 
to  the  ATRS  moment  generating  preprocessor.  The  number  of 
nodes  at  the  STM  stage  F2  is  kept  as  a  variable  depending  upon 
the  the  number  of  prototype  target  categories  that  are  to  be 
established.  An  important  aspect  of  the  network  tdennficacon 
process  is  the  determination  of  critical  settings  of  the  various 
parameters  associated  with  the  network.  One  such  parameter  is 
the  associative  vigilance  parameter  that  determines  how  fine  or 
coarse  the  learned  categories  will  be.  Another  parameter 
associated  with  the  network  is  the  noise  tolerance  parameter  that 
determines  the  level  of  noise  tolerated  by  the  system.  The  ART- 
■  II  network  topology  used  in  [7][6J,  is  shown  in  figure  2: 


Figure  2.  ART-0  network  topology. 

For  the  convenience  of  the  reader,  we  present  the  network 
equations  as  in  [61,  below: 

The  local  STM  activities  pj.  qi^.w;  and  x;  computed  at  stage  Fi 

are  as  follows:  For  i  *  1 . M  and  j  *  M+l . N  where, 

M:  number  of  Fj  input  channels  and  N:  number  of  STM  nodes 
at  stage  F2 

P,-ul+Xg(y^  q,-^ 

“.“eT™  v,=  f(x, )  +  bf(q , ) 

wi 

Wj=Ij  +auj  X'  =  e  +  ,WI 

(All  equations  are  expressed  in  a  dimensionless  form) 
v  represents  the  L2norm  of  the  vector  v .  Vector  norms  are 
shown  as  large  filled  circles  in  Figure  2.  while  the  transfer  of 
information  within  the  network  is  represented  by  the 
corresponding  arrows. 

f(x)  =  0if0<x<O 
1  if  x  >  0 

STM  parameters:  a  =  10.00  b=  10.00  c=0.1  d  =  0.9(0< 
d  <1) 

noise  tolerance  parameter,  0=1/  'JM. 

Ij  :  Input  vector  component 
Zjt :  Top  down  connection  weights 
Zy :  Bottom  up  connection  weights 

The  STM  Equations  at  stage  F2: 

Tj=IPiZij  Tj  =  max  ( Tj :  j  =  M+l . Ml 

i 

g(y^)  =  d  if  Tj  =  max  [  Tj :  the  jth  F2  node  has  not  been  reset 
on  the  current  trial] 


0  otherwise 


The  order  of  presentation  is  unimportant.  Target  images  also 
include  noise  upto  3%.  See  figure  3. 


tugat  2.1  tareet  2.2  target  2.2 

Figure  5.  Examples  of  target  images  (rotation  and  noise) 


The  ATRS  is  capable  of  identifying  all  these  target  images  with  a 
near  100%  accuracy.  When  a  new  target  image,  one  that  has  not 
been  seen  by  the  network  before,  is  input,  the  ATRS  establishes 
a  new  target  category  to  represent  it.  In  the  on  line  case,  when 
the  prototype  pattern  set  is  not  deterministic  and  is  not  presented 
before  testing,  It  should  be  noted  that  the  category  structure 
established  by  the  ATRS  for  a  given  set  of  target  images  is 
different  from  the  one  established  when  the  order  of  presentation 
is  altered.  In  this  situation,  the  ATRS  establishes  a  separate 
target  category  for  the  first  target  image  presented.  If  the 
following  target  image  is  the  same  as  the  first  or  is  a  distorted 
variation  of  it,  the  ATRS  groups  it  with  the  already  established 
target  category  associated  with  the  image.  In  the  default  case, 
when  the  following  target  image  is  different  from  the  one 
seen  before,  a  new  target  category  is  established. 

A  disadvantage  tn  employing  the  ART-II  as  the  major  classifier 
m  the  recogniuon  system,  as  it  is  used  in  the  proposed  model,  is 
the  its  incapacity  to  handle  noise  beyond  a  fixed  percentage.  This 
problem  is  related  to  fixing  the  attentional  vigilance  and  the  noise 
parameters  during  the  recogniuon  process. 


The  vigilance  parameter  associated  with  the  network  allows  for 
feature  manipulation.  It  defines  how  coarse  or  fine  the 
recognition  categories  will  be.  If  the  vigilance  parameter  is  kept 
too  low  there  is  a  possibility  of  false  grouping.  However,  this 
feature  allows  for  dealing  with  noisy  feature  vectors  apart  from 
the  noise  tolerance  parameter  of  the  network.  Determination  of 
on  line  critical  vigilance  tuning  to  obtaining  a  desired 
categorization  of  target  images  is  the  subject  of  ongoing 
research.  In  the  simulation  carried  out  all  parameters  are  kept 
constant. 

A  crossection  of  the  sample  data  relating  to  Figures  4,3  and  6, 
showing  the  different  moments,  the  noise  associated  with  the 
images,  the  rotations,  the  actual  target  category  and  the  category 
established  by  the  ATRS  are  shown  in  Table  1.1, 


A  comparison  of  the  ATRS  with  the  Multilayer  Perception 
using  the  Back  Error  Propagation  Learning  Algorithm  is 
summarized  in  Table  1.2(14],  The  data  set  used  in  the 
comparison  is  kept  the  same.  Seventy  two  target  images  with 
variable  percentages  of  noise  and  distortions  are  presented.  The 
various  parameters  of  the  network  are  kept  fixed  during  the 
presentations. 
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Table  1.2:  Performance  comparison  of  the  ART-II  with  the  BEP 

3.  DISCL'SSm 

We  have  shown  in  this  paper  a  new  technique  for  the 
implementation  of  a  neural  network  based  approach  to  on  line 
target  recognition  and  target  classification  employing  a  neural 
network  model  that  is  capable  of  seif  organization  and  last 
unsupervised  learning.  A  comparison  with  another  popular 
model,  the  BEP,  has  brought  into  sharper  focus  some  of  its 
attributes.  The  approach  taken  is  straight  forward  and  simple. 
Future  work  in  this  direction  includes  integrating  the  ATRS  with 
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a  segmentation  module  that  does  a  dynamic  scene  analysis, 
isolates  the  images  from  the  scene  and  presents  it  to  the 
recognition  system.  We  also  propose  an  on  line  adaptive  tuning 
of  the  attentional  vigilance  parameter  based  on  some  measur-  of 
confidence,  to  further  improve  recognition  in  the  ATRS. 
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Abstract 

A  hybrid  approach  to  the  iterative  solution  of  robotic 
manipulators  has  been  presented.  In  this  method  a  Multilayer 
Feedforward  Network  (MFN)  is  trained  to  provide  an  approximate 
solution  to  the  inverse  kinematic  problem  of  a  robot.  This 
approximate  solution  is  then  utilized  as  the  initial  guess  to  the 
iterative  procedure  which  provides  the  solution  within  some 
specified  tolerance.  Examples  involving  the  PUMA  360  robot  are 
given.  Since  the  time  taken  by  the  MFN  to  provide  the  initial 
estimate  is  only  a  fraction  of  the  time  taken  by  the  iterative 
procedure  to  teach  a  solution,  this  combination  can  be  usefuly 
employed  in  order  to  decrease  the  time  for  obtaining  the  solution  to 
the  inverse  kinemetic  problem  in  robotics. 

1.  Introduction 

Among  the  methods  of  controlling  a  robot  the  inverse 
kinematic  method  of  control  when  fully  utilized  requires  a  total 
knowledge  of  the  kinematics  of  the  robot.  A  robot  is  composed  of 
links'  connected  in  the  form  of  a  chain  with  the  so  called  'joints' 
also  known  as  the  degrees  of  freedom  (DOF).  The  determination  of 
the  position  of  these  links  in  space,  given  the  joint  displacements  is 
called  the  'forward  kinematics'.  The  problem  of  finding  the  joint 
displacements  from  a  specification  of  the  desired  endeffector's 
position  and  orientation  is  called  the  inverse  kinematic  problem 
(IKP). 

The  task  of  a  robot  requires  it  to  manipulate  different  objects. 
These  objects  are  represented  in  a  single  coordinate  system  or 
multiple  coordinate  systems  related  to  each  other  with  a  single 
(global)  coordinate  system.  The  handling  of  the  objects  is  done  by 
the  hand  or  the  'endeffector'  of  the  manipulator.  The  necessity  of  the 
inverse  kinematics  is  thus  a  natural  consequence.  Unlike  the  forward 
kinematics  problem  the  IKP  usually  does  not  have  a  unique 
solution.  Moreover  if  the  system  is  underdetermined  it  will  have 
infinite  number  of  solutions.  This  later  type  constitutes  the 
redundant  class  of  robots  (4], [13]. 
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A  variety  of  procedures  for  solving  the  inverse  kinematics  of 
these  robots  have  been  developed  [2], [10].  Robots  having  finite 
number  of  solutions  have  been  usefully  employed.  The  limitations  to 
successfully  operate  only  such  robots  is  largely  due  to  real  time 
controllability.  The  class  of  robots  that  do  not  have  a  closed  form 
solution,  also  called  unsolvable  manipulatory,  require  iterative 
procedures  in  order  to  obtain  the  joint  values  that  will  position  the 
endeffector  in  the  desired  position  and  orientation. 

A  robot  with  6- DOF  is  required  in  order  to  arbitrarily  place 
an  object  in  a  3D  space.  The  class  of  6-DOF  robots  for  which 
closed  form  solution  to  their  inverse  kinematics  exists  is  small. 
Moreover,  the  inverse  Inrematic  solution  of  these  robots  is  usually 
very  involved  and  requires  some  intuition  into  the  kinematic 
structure.  The  6-DOF  robots  that  do  not  have  a  closed  form  solution 
are  usually  solved  by  iterative  procedures  such  as  Newton  Raphson 
and  gradient  descent  methods  [4].  Robots  having  greater  than 
minimum  number  of  DOF  necessary  for  their  task  are  called 
redundant  robots.  These  robots  are  solved  either  by  fixing  the 
redundant  DOF  and  solving  the  remaining  joints  in  a  closed  form 
manner,  or  by  using  iterative  methods  so  as  to  obtain  a  solution. 

This  paper  discusses  problems  in  using  Multilayers 
Feedforward  Networks  (MFNs)  to  lcam  the  inverse  kinematic 
solution  of  robots.  Also  shown  is  the  computational  efficiency 
achieved  when  MFNs  are  used  as  an  initial  guess  for  the  inverse 
kinematic  solution  by  iterative  methods.  An  example  involving 
PUMA  560  robot  illustrates  this  procedure. 


The  forward  kinematic  problem  may  be  defined  as:  given  'q' 
the  n  dimentional  joint  variables  vector,  find  'X'  the  m  dimentional 
endeffector  position  and  orientation  specificaton  vector  in  the  Work 
Space.  This  can  be  expressed  as 

X  =  f(q)  (1) 

where  /  is  a  nonlinear,  continuous  and  differentiable  function 
relating  the  joint  variables  to  the  work  space  coordinates.  In  these 
terms  the  inverse  kinematic  problem  is  defined  as:  given  X'  the  rr. 
dimentional  endeffector  position  and  orientation  vector,  find  'q'  the 
n  dimentional  joint  variable  vector.  That  is  to  say  to  solve  Eq.  (1)  for 
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q  given  the  vector  X. 


q-f-'(X)  (2) 

Contrary  to  the  forward  kinematic  problem  the  DCP  in  general  does 
not  have  aiunique  solution.  In  the  method  proposed  in  this  paper 
only  the  forward  kinematics  is  utilized  to  solve  the  inverse 
kinematics  of  a  robot 


3.  Proposed  Method 

Our  approach  to  the  solution  of  IKP  of  robots  involves  an 
iterative  procedure  for  which  the  initial  guess  is  given  by  trained 
MFN*.  Fig.  1  shows  that  the  iterative  method  using  the  initial  guess 


Fig.  1:  Schematics  of  the  proposed  method  employing  the 
MFN  to  provide  the  initial  guess  40to  the  iterative 
procedure. 


qo  given  by  the  MFN.  The  MFN  is  provided  with  the  desired 
position  and  orientation  'Xd'  of  the  endeffector  of  the  robot  and  it 
provides  the  estimate  qo  of  the  joint  angles.  This  estimate  is  utilized 
by  the  iterative  procedure  to  provide  with  a  solution  q  that  satisfies  a 
specified  tolerance  for  the  position  and  orientation  of  the 
:endeffector. 

The  MFN  is  trained  in  a  supervised  manner  using 
multilayered  BEP  Algorithm.  The  training  data  constitutes  the  f$(q) 
vector  as  the  input  and  the  corresponding  'qn'  vector  as  the  desired 
output  of  the  network,  where  the  subscripts  signify  the  dimension  of 
the  vector.  This  pair  of  data  can  be  generated  using  only  the  forward 
kinematics  of  the  robot.  The  problem  related  to  arbitrary  data  points 
is  that  in  general  there  are  more  than  one  sol-tion  to  the  inverse 
kinematics  of  a  robot,  which  requires  us  to  select  ^nly  one  'type-  of 
solution  out  of  all  the  possibilities  as  the  tnm.ng  data  set.  Each 
'type'  ofisolution  is  bounded  by  the  so  terr-ed  .  -c-'ar  regions 
which  correspond  to  the  rank  deficiency  of  --e  jv.’hun  of  the 
manipulator  [13].  Thus  the  data  set  correspond.,-?  :o  one  type  of 
solution  can  be  generated  by  starting  at  any  ..  nfiguration/joint 
values,  and  storing  the  relevant  data  while  r.^rementing  in  joint 
space  and  at  each  step  ensuring  that  the  jacob-an  :.r not  near  rank 
deficiency.  The  rank  deficiency  of  thejacobian  .j.i  easily  verified 
by  taking  the  determinant  of  the  jacobian  (J)  or  v-.it  of  •  JJT>  in  case 
of  under  /  over-determined  manipulator  and  comparing  the  result 
with  zero.  Where  the  superscript  indicates  the  transpose  o pennon. 

Another  problem  encountered  is  in  the  range  of  the  joint 


angles.  For  a  particular  type  of  solution  the  entire  joint  range  for 
each  joint  may  not  be  covered.  Thus  only  portions  of  the  maximum 
joint  range  may  be  utilized.  These  portions  if  disjoint  need  to  be 
trained  by  separate  networks.  This  problem  will  be  further  clarified 
in  the  next  section. 

The  Network  Architecture 

The  proposed  MFN  architecture,  shown  in  Fig.  2,  consists 
of  n  (■  number  of  joints)  cluster  of  networks  each  cluster  consisting 
of  Kj  networks.  Where  K;  is  the  number  of  disjoint  regions  in  the 
solution  of  the  i  th  joint,  where  i  varies  from  1  to  n.  There  is  a  pair 
of  output  nodes  for  each  K,th  network,  one  of  which  is  trained  to 
give  the  solution  to  the  joint  variable  'i*  and  the  other  is  trained  to 
indicate  the  validity  of  the  solution  given  by  the  first  node.  This 
validity  may  be  taught  by  making  the  2nd  node  to  give  a  'high'  if  the 
solution  lies  in  the  domain  of  the  corresponding  1st  node  and  a 
'Low'  otherwise.  The  input  nodes  are  equal  to  the  dimension  of  the 
f(q)  vector.  Also  two  layers  of  hidden  units  are  employed.  All  nodes 
have  a  symmetric  sigmoidal  activation  function  [12].  In  order  to 
have  the  solution  to  all  the  possible  types  we  would  require  L  such 
networks,  where  L  is  the  number  of  types  of  solutions. 


Fig.  2:  The  Architecture  of  the  Network. 

The  training  of  the  network  consists  of  the  supervised  Back 
Error  Propagation  (BEP)  algorithm  used  to  train  the  output  nodes 
corresponding  to  the  joint  values,  given  by: 

q  *  tqt  q2  •  -  qJT  (3) 

for  :he  disjoint  region  of  solution  related  to  the  position  and 
orientation  specification  of  the  endeffector  Xd  which  is  provided  to 
the  input  layer  nodes.  The  position  specification  consists  of  the 
(x,y,z)  displacement  of  the  origin  of  the  coordinate  system  attached 
to  the  endeffector  with  respect  to  the  work  space  frame.  The. 

orientation  specification  consists  of  ($,9,y),  the  Eulerian  angles  [2]. 

Xd  *  [$  9  y  x  y  z]T  (4) 

•v  $  is  the  rotation  about  the  OZ  axis,  9  is  the  rotation  about  the 

OU  axis,  and  y  is  the  rotation  about  the  OW  axis,  in  the  given 
sequence,  in  endeffector’s  frame  of  reference. 
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Ibfi  Tteralive  Procedure 

The  schematics  of  the  iterative  method  is  shown  in  Fig.  3. 
This  procedure,  commonly  called  the  Newton  Raphson  method 
utilizes  the  linearized  approximation  as  given  below. 

d(f<q<»>))  ♦  J(q(,)JC<t))  dq(t)  -  0  (5) 

Eq.  (5)  is  solved  for  dfy),  which  is  the  change  in  the  qw  required  to 
give  a  better  estimate  for  the  joint  values.  The  joint  variables  are 
updated  as  follows 

%♦!)  *  9(t)  +  dq<t)  (6) 


Fig.  3:  Flow  diagram  for  the  iterative  procure  :o  find 
the  joint  values. 

Here,  d(f(q^tj))  jj  the  6  dimensional  differe-:ul  position  & 
orienution  vector.  is  the  (6  x  n)  ja.-  -an  -njtr.t  .-f  ;he 

manipulator  signifying  the  rates  of  each  elerr.e-s  j:  (  h(CJ 

,0  %)•  Where  'n'  is  the  number  of  joints  corresponds 
10  the  incremental  change  required'iri. the  joint  .  -.ra-ies  to  satisfy 
equation  (5)  for  a  'better1  solution.  The  subscript  r  c  .responds  to 
Ume  Jtcpj- Here  may  be  mentioned  that  only  the  ire-*  ledge  of  the 
ferwaid  kinematics  of  the  robot  is  sufficient  for  re  .  .■—putatson  of 
«q)  and  J(qjt), 

4.  Exatnnln 

.  solution  to  the  inverse  kinematics  has  been  pr  /iously 

pUM®^*ml  3  manipulators  [5].  The  Kinematic  S'  jcrure  of 
robot  Unimadon  was  used  for  the  proposed  me  thod. 


Also  the  results  thus  obtained  were  compared  with  the  conventional 
method  of  Iterative  method  alone.  The  PUMA  560  robot  consists  of 
'6*  revolute  joints.  The  first  three  joints,  that  correspond  to  the 
Waist,  Sholder  and  Elbow  motion,  contribute  mainly  in  positioning 
of  the  endeffctor.  While  the  last  three  joints  correspond  to  the  Wrist 
motion  and  contribute  mainly  in  orienting  the  endeffector.  This 
manipulator  was  chosen  for  ease  of  generation  of  data  and 
verification  of  results  since  it  has  a  closed  form  solution.  It  has  eight 
solution  for  a  given  positon  and  orienution  signified  by  Right/Left  • 
Shoulder,  Above/Below  •  Elbow,  and  Up/Down  •  Wrist.  The 
training  dau  corresponded  to:  LEFT  Arm,  BEIOW  Elbow  and  UP 
Wrist  configuration.  The  PUMA  560  parameters  were  taken  from 
reference  [2],  page  37.  However,  in  the  simulations  the  joint  limiu 
used  for  the  6th  joint  were  -180°  to  180°  instead  of  -266®  to  266®. 

As  mentioned  in  the  previous  section  that  there  may  be  more 
than  one  disjoint  region  in  which  the  solution  for  a  particular 
configuration  lies.  In  the  case  of  PUMA  560  with  the  above 
considered  configuration  the  values  for  joint  number  6  lie  in  two 
disjoint  regions.  One  cotTesponds  to  (-x/2  to  -x)  and'  the  other 
corresponds  to  (x/2  to  x).  These  need  to  be  trained  separately.  But 
in  our  case  we  were  able  to  combine  the  two  by  transforming  these 
two  regions  by  x  and  -x  respectively  so  that  now  they  lie  In  the  1st 
and  the  4th  quadrant  instead  of  3rd  and  2nd  quadrants  and  then 
trained  them  with  a  single  network. 

The  network  in  this  case  consisted  of  6  input  nodes  one 
output  node  each  for  the  6  joints  and  two  hidden  layers  for  each  joint 
consisting  of  32  nodes  in  the  first  layer  and  8  nodes  in  the  second 
layer.  The  standard  Back  Error  Propagation  algorithm  was  used 
with  learning  rate  of  0.1  and  the  momentum  term  as  0.2.  The 
training  was  done  for  138  persentation  of  the  data  set  which 
consisted  of  800  samples.  The  average  error  and  the  standard 
deviation  of  the  error  in  the  solution  given  by  the  MFN  for  each  joint 
taken  over  100  samples  is  given  in  uble  1.  From  uble  1  it  is  seen 
that  the  solution  given  by  MFNs  is  more  scattered  for  Joints  4  to  6 
as  compared  with  the  joints  1  to  3. 

Table  1:  Statistics  of  the  solution  given  by  The  MFN. 


Joint 

#  : 

Error  in  Degrees 

Average 

Standard  Deviation 

1  ; 

4.06 

13.428 

:  2  ; 

•0.16 

30.492 

:  3  ; 

5.64 

1U52 

:  4 

3.66 

61;236 

5 

-3.70 

24.912 

6 

-6.02 

36.828 

11-343 


For  the  purpose  of  comparison  as  to  how  good  is  the  initial 
guess  provided  by  the  MFN',  we  presented  the  initial  guess  as  the 
actual  solution  uniformly  perturbed  randomly  between  a  fraction  of 
joint  limits  to  the  iterative  method  alone  for  100  samples 
corresponding  to  each  fraction.  The  average  number  of  iterations  for 
the  100  samples  presented  for  each  fraction  of  perturbation  of  this 
Informed  Newton  Raphson  Method  (INRM),  indicated  by  INRM,  is 
plotted  in  Fig.  4.  For  these  same  samples,  requiring  the  positioning 
and  orientating  of  the  endeffector,  the  guess  given  by  the  MFN  was 
also  presented  to  the  iterative  method  as  the  initial  guess.  The 
corresponding  average  of  the  100  samples  for  the  solution  to  the 
initial  guess  provided  by  the  MFN  is  also  included  in  this  Figure 
which  is  indicated  by  MFNNRM.  From  Fig.  4  it  is  seen  that  the 
average  number  of  iterations  for  MFNNRM  correspond  to  less  that 
20%  perturbation  with  tespect  to  the  joint  limits  about  the  actual 
solution  for  INRM.  In  the  simulations  Eq.  (5)  was  solved  by  Gauss 
Elimination  method  and  partial  pivoting.  The  maximum  number  of 
iterations  allowed  for  the  iterative  method  were  100.  The  iterative 
method  was  successfully  teminated  when  the  norm  of  the  difference 
between  the  desired  and  actual  endeffector  position  and  orientation 
was  less  than  1.0E-#. 


0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1.0 
Per  Unit  of)oint  Limiu 


Fig.  4:  Comparison  of  the  proposed  method  with  informed 
Newton  Raphson  Method. 


Lastly,  the  proposed  method  was  compared  by  giving  a 
fixed  estimate  to  the  iterative  procedure.  This  Fixed  Estimate  was 

taken  as:  0j  *  0, 02  *  0,  03  =*  it/4,  04  *  0,  05  =  n/-.,  and  06  =  0, 
which  is  a  configuration  corresponding  to  which  the  MFN  was 
trained,  as  indicated  in  the  begining  of  this  section.  The  average  and 
standard  deviation  for  the  number  of  iterations  for  the  proposed 
method  and  the  Fixed  Estimator,  in  a  run  of  100  data  points  is  given 
in  Table  2.  From  this  table  we  can  see  that  the  proposed  method 
achieves  more  than  a  two  fold  efficiency  in  computing  with  more 
consistency.  Moreover,  it  was  observed  that  the  time  taken  by  the 
MFN  equals  two  time  units  of  the  iterative  procedure,  which 
amounts  to  less  than  10%  of  the  time  required  to  get  the  solution  by 
the  Fixed  Estimator  method. 


Table  2 


Method 

#  of  Iterations 

Average 

Standard  Deviation 

Proposec 

9.96 

12.95 

Newton 

Raphson 

21.09 

18.90 

f.fliscuaian 


A  new  approach  to  the  iterative  solution  of  robotic 
manipulators  has  been  presented.  This  approach  makes  use  of  the 
trained  MFN  to  provide  the  initial  guess  to  the  iterative  procedure.  It 
has  been  shown  through  examples  with  the  PUMA  560  robot  that 
the  proposed  method  achieves  at  least  two  fold  computational 
efficiency,  where  the  time  taken  by  the  MFN  to  provide  the  initial 
estimate  is  only  a  fraction  of  the  total  time  to  reach  a  solution.  Future 
efforts  will  be  directed  toward  achieving  better  estimates  by  the 
MFN  and  to  the  teaming  of  the  solution  to  the  DCP  of  redundant 
robots. 
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On  Learning  in  Neurocontrollers 

A.  Guez,  Z.  Ahmad 
ECE  Dept.  Derexel  University 

Exploratory  schedules  (ES)  are  reference  input  trajectories  designed  to  enhance  the  learning 
of  system  parameters.  Such  trajectories  in  general  may  not  be  the  desired  trajectories,  resulting  in 
larger  tracking  errors.  However,  ES  offer  faster  convergence  to  the  system  parameters  and 
therefore  yield  smaller  long  term  tracking  errors.  The  automation  for  the  design  of  ES  requires 
online  modification  of  the  desired  trajectory  to  enhance  learning  at  the  expense  of  poorer  initial 
tracking.  We  discuss  this  closed  loop  mode  of  generation  of  ES,  and  give  an  example  of  the 
benefits  achieved  by  the  utilization  of  ES  in  the  context  of  controller  design. 
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Exploratory  schedules  (ES)  are  reference  input  trajectories  designed  to  enhance  the  learning 
of  system  parameters.  Such  trajectories  in  general  may  not  be  the  desired  trajectories,  resulting  in 
larger  tracking  errors.  However,  ES  offer  faster  convergence  to  the  system  parameters  and 
therefore  yield  smaller  long  term  tracking  errors.  The  automation  for  the  design  of  ES  requires 
online  modification  of  the  desired  trajectory  to  enhance  learning  at  the  expense  of  poorer  initial 
tracking.  We  discuss  this  closed  loop  mode  of  generation  of  ES,  and  give  an  example  of  the 
benefits  achieved  by  the  utilization  of  ES  in  the  context  of  controller  design. 
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On  Generation  of  Exploratory  Schedules  in  Closed  Loop  for 
Enhanced  Machine  Learning 

A.  Guez,  Z.  Ahmad 
ECE  Dept.  Drexel  University 

Exploratory  schedures  (ES)  are  reference  input  trajectories  designed  to  enhance  the  learning 
of  system  parameters.  Such  trajectories  in  general  may  not  be  the  desired  trajectories,  resulting  in 
larger  tracking  errors.  However,  ES  offer  faster  convergence  to  the  system  parameters  and 
therefore  yield  smaller  long  term  tracking  errors.  The  automation  for  the  design  of  ES  requires 
online  modification  of  the  desired  trajectory  to  enhance  learning  at  the  expense  of  poorer  initial 
tracking.  We  discuss  this  closed  loop  mode  of  generation  of  ES,  and  give  an  example  of  the 
benefits  achieved  by  the  utilization  of  ES  in  the  context  of  controller  design. 
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HYBRID  EXPERT-NEURO  CONTROLLER  FOR  ROBOTS 

I.  Bar-Kuna,  A.  Guez,  and  l.Rusnak 
ECE  Dept.,  Drexel  University.  Philadelphia,  PA  19104. 

ABSTRACT 

A  hybrid  neural  network  and  expert  system  based  controller  is  designed  for  the 
control  of  rigid  robot  manipulators.  The  composition  of  neural  net  and  expert  system 
technologies  is  used  to  combine  the  strength  of  both.  The  neural  network  based  module 
deals  well  with  continuous  adaptation  to  small  and-  continuous  parameter  perturbations  in 
the  robot  dynamics  1 1  j.  and  with  improved  parameter  identification.  The  expert  system  is 
capable  oi  handling  a  low-order.  uniimums  and  complex  set  of  model  and  environment 
parameter  changes  via  a  set  ot  prestored  rules  realizing  informed  sequential  decision 
making  expertise. 

The  combination  of  these  modules  has  the  potential  of  improving  the  performance 
of  adaptive  robot  controllers  under  complex  task-requirements  in  changing  environments, 
when  variousmbjectives.  m;Ji  .is  performance,  stability,  identification,  error  reduction, 
noise  robustness,  task  variation,  etc  .  become  dominant.  The  final  product  is  an  Expert 
System  that  coordinates  the  ddlcicnt  objectives  of  the  tasks  of  a  robot  during  its  life  cycle. 

[1 1  A.  Guez  and  I.  Bar-K.::  j  "7  vs  dc c rce -of- freedom  Robot  Neurocontroller."  Conf.  on 
DecisiomandiControl.  Mono  • .  H.>  >  m.  u>u(>. 

f2|  J.  W.  Selinsky  and  A.  Gaez  'The  Rote  of  Apriori  Knowledge  of  Plant  Dynamics  in 
N'eurocontroller  Design."  Con'-.  on  Decision  and  Control,  Tampa.  Florida.  1989,  pp. 
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VITAL  FUNCTION  STATUS  -  A  Parameter  To  Facilitate  Decision  Making  In  Anesthesia 

I.  Nevo,  MD,  J.  V.  Roth,  MD,  F.  Ahmed*,  A.  Guez,  Ph.D* 

Dept,  of  Anesthesiology,  Albert  Einstein  Medical  Center,  Philadelphia,  PA; 

*  Dept,  of  E.C.E,  Drexel  University,  Philadelphia,  PA. 

The  anesthesia  environment  is  a  dynamic  environment  where  the  anesthesiologist  has  to  make, 
with  a  degree  of  uncertainty,  prompt  decision  in  order  to  prevent  disastrous  outcomes  to  the 
patient.  The  anesthesia  environment  often  consists  of  7  to  8  monitors  displaying  multiple  numbers 
and  waveforms  in  different  formats  at  different  locations.  Anesthesia  accidents  are  typically 
caused  by  more  than  one  factor  including  human  errors,  equipment  failure,  surgical  events,  or 
alteration  in  physiology  caused  by  patient's  disease.  Whereas  equipment  failure  is  still  a 
component  of  accidents,  the  greatest  number  of  anesthetic  mishaps  can  be  attributed  to  human 
error  (1,2).  There  is  a  need  for  simplification  and  improvement  of  the  information  made 
available  to  the  anesthesiologist. 

We  propose  a  new  parameter,  the  Vital  Function  Status  (VFS),  as  part  of  our  overall  project  to 
develop  an  expert  system  in  anesthesia.  The  VFS  is  a  semi-quantitative  value  which  summarizes 
the  patient's  physiologic  condition  and  depicts  at  what  level  of  danger  the  patient  is  in. 

Exploiting  recent  developments  in  the  fields  of  anesthesia,  perception  and  cognition,  expert 
systems,  artificial  intelligence,  distributed  sensing,  adaptive  signal  processing,  machine  learning, 
and  man  machine  interface  technologies,  a  prototype  has  been  partially  developed.  An  IBM  AT- 
286  with  an  80287  coprocessor  samples,  from  physiologic  monitors,  analog  signals  at  rates  up  to 
500  Hz  and  digital  signals  every  second.  Turkx)  C  is  the  development  environment.  The  system 
contains  two  major  subunits,  the  Vital  Function  Unit  (VFU)  and  the  Inference  Unit  (IU).  The 
physiologic  monitors  are  interfaced  into  the  VFU.  The  values  are  validated,  the  signals  are 
filtered,  and  derivative  parameters  are  computed.  Any  derivative  parameter  (e.g.  lung 
compliance,  systemic  and  pulmonary  vascular  resistance,  left  and  right-ventricular  stroke  work 
index,  oxygen  delivery,  arterial-venous  oxygen  difference)  for  which  there  is  sufficient 
information  is  calculated  every  second.  Each  value  is  compared  to  reference  values  residing  in  the 
Reference  Value  Unit  (RVU)  and  assigned  one  of  six  levels  of  danger  ranging  from  0  (no  danger)  to 
5  (immediate  critical  danger).  All  of  the  assigned  values  are  grouped  by  threshold  into  one 
parameter  VFS  having  a  value  from  0  to  5. 

On  the  screen  two  graphs  are  displayed.  The  VFS  evolved  within  the  last  three  minutes  (actual 
VFS)  and  the  trend  during  the  last  30  minutes  (trend  VFS)  with  scroll  back  option.  Once  a  change 
is  detected,  the  system  displays  the  parameter  that  first  deviated.  The  the  causal  event  can  than  be 
more  easily  identified. 

We  believe  that  more  frequent  analysis  of  monitoring  data  and  earlier  detection  will  facilitate 
decision  making  in  anesthesia.  It  may  reduce  the  anesthesiologist's  response  time  which  may 
prevent  further  deterioration  in  the  patient's  condition. 
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VITAL  FUNCTION  STATUS  -  A  Comprehensive  Display 
To  Enhance  Decision  Making  in  Anesthesia  and  ICU 

I.  Nevo,  MD,  J.  V.  Roth,  MD,  F.  Ahmed*,  A.  Guez,  Ph.D.* 

Dept.  of  Anesthesiology,  Albert  Einstein  Medical  Center, 
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It  is  estimated  that  between  2,000  and  10,000  patients  die  in 
the  USA  each  year  from  causes  at  least  partially  related  to 
anesthesia.  The  morbidity  rates  related  to  anesthesia  are  higher. 
Whereas  equipment  failure  is  still  a  component  of  accidents,  the 
greatest  number  of  anesthetic  mishaps  can  be  attributed  to  human 
error.  The  anesthesia  environment  is  a  dynamic  environment  where 
the  anesthesiologist  has  to  make,  with  a  degree  of  uncertainty, 
prompt  decisions  in  order  to  prevent  disastrous  outcomes  to  the 
patient.  The  anesthesia  environment  often  consists  of  7  to  8 
monitors  displaying  multiple  numbers  and  waveforms  in  different 
formats  at  different  locations.  The  human  brain  is  capable  of 
perceiving  concomitantly  seven  informational  inputs  and  of 
processing  efficiently  only  2  or  3  at  the  same  time.  Fatigue, 
inexperience  and  stress  further  decrease  efficiency,  may  cause  one 
to  fixate  on  a  specific  idea,  and  increase  the  probability  of  human 
error.  There  is  a  need  for  the  simplification  and  improvement  of 
the  information  made  available  to  the  anesthesiologist. 

We  propose  a  new  parameter,  the  Vital  Function  Status  (VFS) , 
as  part  of  our  overall  project  to  develop  an  expert  system  in 
anesthesia.  The  VFS  is  a  semi-quantitative  value  which  summarizes 
the  patient's  physiologic  condition  and  depicts  at  what  level  of 
danger  the  patient  is  in. 

Exploiting  recent  developments  in  the  fields  of  anesthesia, 
perception  and  cognition,  expert  systems,  artificial  intelligence, 
distributed  sensing,  data  acquisition,  adaptive  signal  processing, 
machine  learning,  and  man-machine  interface  technologies,  a 
prototype  has  been  partially  developed.  An  IBM  AT-286  with  an 
80287  co-processor  samples  from  the  physiologic  monitors  analog 
signals  at  500  Hz  and  digital  signals  every  second.  Turbo  C  is  the 
development  environment.  The  system  contains  two  major  subunits, 
the  Vital  Function  Unit  (VFU)  and  the  Inference  Unit  (IU)  .  The 
physiologic  monitors  are  interfaced  into  the  VFU.  The  values  are 
validated,  the  signals  are  filtered,  and  derivative  parameters  for 
which  there  is  sufficient  information  are  calculated  every  second. 
Each  value  is  compared  to  reference  values  residing  in  the 
Reference  Value  Unit  (RVU)  and  assigned  one  of  six  levels  of  danger 
ranging  from  0  (no  danger)  to  5  (immediate  critical  danger).  All 
of  the  assigned  values  are  grouped  by  threshold  into  one  parameter, 
VFS,  having  a  value  from  0  to  5. 

On  a  screen,  two  graphs  are  displayed:  the  VFS  evolved  within 
the  last  three  minutes  (actual  VFS)  and  the  trend  over  the  last  30 
minutes  (trend  VFS) .  Once  a  change  is  detected,  the  system 
displays  the  parameter  that  first  deviated.  This  allows  for  the 
causal  event  to  be  more  easily  identified.  This  will  assist  the 
anesthesiologist  in  responding  to  the  causal  effector  and  not  just 
to  an  event  in  a  cascade. 

We  believe  that  more  frequent  analysis  of  monitoring  data  and 
earlier  detection  of  deviations  will  facilitate  decision  making  in 
anesthesia.  It  may  reduce  the  anesthesiologist's  response  time 
which  may  prevent  further  deterioration  in  the  patient's  condition. 
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ABSTRACT 

The  anesthesia  environment  is  a  dynamic  and  stochastic 
system  where  an  overwhelming  amount  of  data  is  constantly 
displayed.  Nevertheless,  prompt  decisions  ought  to  be  made  and 
subsequent  treament  should  be  applied.  We  propose  the  ANAA 
(ANesthesiologist's  Adaptive  Associate)  which  is  an  adapuve 
real  time  expert  system. 


INTRODUCTION 


The  anesthesia  environment  is  a  dynamic  and  stochastic 
system  in  which  the  anesthesiologist  has  to  make  decisions 
promptly  to  prevent  disastrous  outcomes  (1).  Throughout  the 
surgical  procedure,  the  anesthesiologist  jias  to  maintain  the 
patients  vital  functions  stable  and  within  normal  ranges,  and  to 
prevent  a  change  in  any  vital  function  which  usually  propels  a 
deterioration  of  other  vital  functions/ 

The  human  brain  is  capable  of  perceiving  on  the  order  of 
seven  informational  inputs  concomitantly  and  processing 
effectively  only  2  or  3  at  one  time  [2].  Moreover,  sharing  the 
attention  reduces  the  efficacy  of  such  performance  [3].  The 
anesthesia  workstation  often  consists  of  seven  to  eight  monitors 
each  one  displaying  multiple  numbers  and  waveforms  in 
different  formats.  The  physicians  task  is  further  complicated  by 
the  scattered  information  and  the  responsibility  of  watching  the 
patient  as  well  as  the  surgical  field.  As  a  result,  too  often  times 
the  physician  performing  his  duties  reacts,  remotely  in  time,  not 
to  a  cause  of  an  initial  problem  but  rather  to  its  consequence. 

To  enhance  the  anesthesiologist  performance  and  to 
improve  the  patient's  outcome  we  are  developing 
ANesthesiologist's  Adaptive  Associate  (ANAA),  an  adapuve 
knowledge  based,  real  time  expert  system  (See  figure  below), 
that  will  work  in  association  with  the  anesthesiologist. 


ANAA  DESCRIPTION 


The  system  is  described  in  the  figure  below.  It  consists  of 
monitors,  data  acquisition  hardware,  signal  interfaces.  PC. 
expert  system  software,  feedback  loops  for  knowledge  base 
refinement  and  an  elaborate  display.  The  hardware  consists  of 
IBM-PC  AT  (w/  Intel  80286)  interfaced  with  seven  monitors 
through  analog  I/O  board  (Analog  Devices)  and  three  RS232 


serial  communication  ports.  The  expert  system  consists  of  two 
units:  Patient  Status  Unit  (PSU)  and  Inference  Unit  (IU). 


ANAA  OPERATION 


Patient  Status  Unit  (PSU): 

The  system  contains  a  set  of  Reference  Values  (RV). 
These  consists  of  normal  ranges  (matched  for  the  patient’s  age 
group)  and  five  different  levels  of  danger  for  each  vital  sign 
(from  none  to  severe  danger)  [1].  Prior  to  the  begining  of  the 
anesthesia  the  PSU  incorporates  into  the  RV  the  patient’s  basal 
vital  signs  values.  During  the  anesthesia  new  data,  as  well  as 
computed  values  (e.g.  compliance,  resistance  etc.),  are 
compared  to  the  RV  and  to  previous  values.  Subsequently, 
individual  danger  level  for  each  vital  sign  is  computed;  all  the 
danger  levels  are  condensed  to  give  a  single  display 
parameter.  The  displayed  new  parameter  is  a  rjjJ  time  cadtm’.S 

vital  Eunsaion  Status  (VFS)- 


This  unit  contains  the  anesthesiologist's  expcrtise.The 
danger  levels  of  each  vital  sign  will  be  sent  to  this  unit.  Once  a 
deviation  from  normal  is  detected  1U  will  display  the  following: 
a)  the  first  parameter  that  changed,  b)  a  list  of  differential 
diagnosis  (ranked  according  to  their  life  threat  potential,  and  c)  a 
list  of  proposed  actions  to  correct  the  deviation. 

The  IU  will  be  adaptive  in  nature,  i.e.,  it  will  suggest 
course  of  actions  drawn  from  a  knowledge  base,  which  can  be 
updated  and  refined  through  a  learning  loop  based  on  real  time 
performance  criteria.  The  IU  will  compare  its  suggested  actions 
to  the  action  taken  by  the  anesthesiologist.  If  they  are  different 
and  the  anesthesiologist’s  action  brings  the  patient  to  a  more 
desirable  state,  then  the  knowledge  base  gets  updated. 

This  system  will  not  use  any  probabilistic  or  statistical 
methodology,  rather  it  will  be  based  on  correlation  analyses.  It 
will  reduce  the  informational  overload  from  the  anesthesiologist, 
by  providing  a  composite  and  concise  display  of  information  on 
one  screen.  It  will  detect  the  first  change  that  took  place  and  the 
VFS  will  provide  an  holistic  information  regarding  the  patient 
situation.  By  suggesting  differential  diagnosis  and  actions,  it 
will  give  the  physician  a_wider  choice  of  options,  which  he, 
otherwise,  may  overlook,  l  ne  adaptive  learning  loop  will  make 
it  more  flexible.  Overall,  it  will  enhance  the  quality  assurance 
and  will  improve  patient  outcome. 


Optional 


ANAA 


Figure:  ANAA's  Architecture 
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Simple  adaptive  control  for  a  class  of  non-linear  systems  with 
application  to  robotics 

IZHAK  BAR-KANAt  and  ALLON  GUEZf 

Simple  adaptive  control  techniques  have  recently  been  developed  fos  continuous 
stationary  and  non-stationary  linear  systems,  based  on  the  fact  that  the  output 
stabilizability  property  of  systems  can  be  used  to  realize  'almost  passive’  configur¬ 
ations  that  may  facilitate  the  implementation  of  simple  robust  adaptive  controllers. 
This  paper  extends  these  concepts  to  a  class  of  non-linear  systems  that  are  linear 
in  the  control;  this  includes  manipulators  and  similar  systems.  Since  it  has  recently 
been  shown  that  non-linear  systems  that  can  be  stabilized  via  static  or  dynamic 
output  feedback  become  ‘almost  passive’  via  some  corresponding  parallel  feedfor¬ 
ward,  a  rigorous  proof  of  the  robust  stability  of  the  simple  adaptive  control  of  non- 
stationary  almost  passive  systems  is  presented. 


1.  Introduction 

Most  adaptive  control  techniques  (Landau  1979,  Astrom  1983)  assume  prior 
knowledge  of  the  order  (or  an  upper  bound  on  the  order)  of  the  unknown  controlled 
plant.  Prior  knowledge  of  the  pole-excess  in  the  plant  is  also  needed.  This  prior 
knowledge  is  used  to  implement  observer-based  controllers  of  the  same  order  as  the 
plant.  Since  these  assumptions  may  not  be  satisfied  in  realistic  large  systems,  a  simpli¬ 
fied  adaptive  control  approach  was  developed  by  Sobel  et  al.  (1979,  1982)  arid  Bar- 
Kana  and  Kaufman  (1982  a,  b).  The  developers  of  this  approach  were  influenced  by 
the  progress  of  adaptive  control  methods  in  the  late  70s,  especially  the  model  reference 
and  stability  analysis  approach  of  Landau  (1979),  Monopoli  (1974),  and  Narendra 
and  Valavani  (1979).  However,  they  were  also  interested  in  the  simple  adaptive  control 
methods  of  the  60s  (Whitaker  1959,  Osburn  et  al.  1961,  Donaldson  and  Leondes 
1963,  Parks  1966).  Since  from  the  very  beginning  they  had  to  deal  with  real-world 
large  scale  systems,  Kaufman  et  al.  (1981)— the  developers  of  this  approach — were 
forced  to  consider  real  world  constraints,  which  had  recently  been  considered  realisti¬ 
cally  in  adaptive  and  general  control  design  under  the  topic  ‘robustness’,  as  follows. 

(a)  The  order  of  real  world  plants  is  large  and  generally  unknown,  and  the  models 
available  for  the  control  design  are  only  low-order  approximations  of  the  real 
plant.  Even  if  the  order  is  known,  it  is  normally  too  large  to  allow  using 
controllers  of  the  same  (or  a  larger)  order  as  the  plant 

( b )  The  same  applies  for  the  pole-excess,  or,  in  the  general  multivariable  c-~~,  for 
the  MacMillan  degree  of  the  plant. 

(c)  Although  the  algebraic  conditions  for  the  minimum  order  of  stabilizing  con¬ 
trollers  are  an  outstanding  unsolved  problem,  most  real-world  systems  can 
easily  be  stabilized  with  simple  controllers.  Many  plants  are  even  stable  at  the 
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start.  However,  stability  is  the  first  and  necessary  condition  but  is  not  sufficient; 
the  control  designer  is  interested  in  performance. 

The  question  that  arises  from  the  constraints  and  assumptions  above  is:  can  the 
basic  stabilizability  of  the  controlled  plants,  combined  with  specific  non-linear  adaptive 
control  techniques,  fit  the  right  gains  to  the  right  operational  situation  so  that  perform¬ 
ance  is  obtained  while  stability  is  guaranteed?  The  answer  is  not  necessarily  positive 
when  non-stationary  or  non-linear  controllers  are  used.  Yet,  starting  with  modest  re¬ 
sults  (Sobel  et  al.  1982),  which  only  presented  the  first  conditions  needed  for  the  stable 
behaviour  of  a  basic  simplified  adaptive  control  algorithm,  this  approach  was  devel¬ 
oped  and  extended  to  result  in  a  simple,  robust,  and  well-performing  algorithm  (Bar- 
Kana  1989  d)  whose  implementation  is  feasible  for  most  real-world  applications. 

The  first  motivation  and  representative  examples  of  this  approach  are  large  flexible 
structures.  Whether  we  treat  them  as  finite  systems  or  as  infinite  dimensional  sytems, 
their  order  is  very  large  and  unknown.  Still,  they  may  be  stabilizable  via  low-order 
controllers.  Fixed  low-order  controllers  may  not  be  able,  however,  to  guarantee  the 
desired  performance,  and  this  job  we  leave  for  non-linear  adaptive  controllers.  Al¬ 
though  one  can  find  examples  of  plants  that  cannot  be  stabilized  by  controllers  oflesser 
order  than  the  plant,  these  are  not  common  where  processes,  planes,  missiles,  manip¬ 
ulators,  etc.  are  concerned,  rather  than  the  ‘general  model’.  Root-locus  or  Bode 
analysis  shows  when  the  information  on  the  plant  permits  such  an  analysis.  High 
order  of  a  plant  and  'unmodelled  dynamics’  do  not  necessarily  justify  problems  that 
some  adaptive  control  algorithms  are  expected  to  solve  in  the  well-known  examples 
due  to  Rohrs  el  ui  ( 1982,  1985)  which  are  both  stable  and  minimum-phase,  and  could 
have  been  controlled  even  by  a  controller  of  zero  order  (constant). 

Recently,  it  has  been  shown  for  stationary  and  non-stationary  multivariable  linear 
systems  (Bar-Kana  and  Kaufman  1985  a,  Bar-Kana  1986  a,  1987  a,  1988  e,  1989  b)  that 
stable,  simple  adaptive  controllers  (SACs)  can  be  implemented  using  some  vague 
prior  knowledge  or  assumptions  about  the  basic  stabilizability  properties  of  the  plant 
to  be  controlled.  In  fact,  the  simple  result  states  that  if  a  plant  can  be  stabilized  by 
some  simple  configuration,  the  adaptive  controller  can  adaptively  compute  the  de¬ 
sired  controller  gains  without  using  any  real  initial  knowledge.  The  stability  of  a 
non-linear  adaptive  control  system  is  guaranteed  if  the  inverse  of  some  stabilizing 
configuration  is  used  as  parallel  feedforward  to  augment  the  plant  to  be  controlled 
(Bar-Kana  1987  a.  b.  1989  c).  The  (multivariable)  augmented  controlled  system  is  now 
called  'almost  strictly  passive’ (ASP),  because  it  has  been  shown  that  there  exists  some 
positive  definite  static  (unknown)  output  feedback,  such  that  the  fictitious  resulting 
closed-loop  system  is  strictly  passive  (also  strictly  positive  real,  in  linear  time-invariant 
systems),  and  strict  passivity  is  characterized  by  some  relations  which  guarantee  the 
robust  stability  of  non-linear  controllers. 

In  other  words,  the  passivity  (or  positive  realness  in  stationary  linear  systems) 
conditions  required  to  guarantee  the  stability  of  a  non-linear  adaptive  control  system 
are  in  fact  a  diilerent  and  equivalent  formulation  of  the  basic  stabilizability  properties 
of  the  plain  :>•  he  controlled.  The  result  is  a  simple  and  robust  direct  model  reference 
adaptive  control  i  \1  RAC)  algorithm,  and  its  applications  now  include  non-minimum- 
phase  and  unstable.  non-stationary  or  non-linear  examples  of  missiles,  flexible  struc¬ 
tures.  motors,  helicopters,  manipulators,  and  other  servo  systems.  In  fact,  the  appli¬ 
cations  people  are  now  trying  to  collect  the  most  difficult  examples  that  can  be  found. 
The  simphciiv  of  implementation  may  make  it  attractive  for  engineers  with  practical 
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applications  on  their  mind,  although  theoretically  one  can  ask:  ‘How  do  you  know 
that  the  system  is  stabilizabie?’  This  question  is  not  so  serious  when  one  tries  to 
control  a  process,  plane  or  manipulator,  rather  than  the  general  control  model.  The 
adaptive  control  designer  must  take  into  consideration  that  most  control  designs  and 
even  robust  control  designs  are  not  adaptive.  The  problem  of  the  control  designer  is 
the  performance,  rather  than  the  stability,  of  the  control  system.  However,  without 
guaranteeing  stability,  performance  cannot  even  be  discussed.  When  called  to  select 
the  fixed  gains  of  the  controller,  the  designer  must  trade-off  his  various  choices.  High 
gains  imply  high  manoeuvrability,  but  they  also  imply  high  cost  of  control,  high  noise 
amplification  and  a  ‘nervous’  response  even  if  these  are  not  necessarily  needed.  Low- 
gains  imply  smooth  response  and  low  control  cost,  but  they  do  not  provide  the 
needed  manoeuvrability  when  it  is  required.  ‘Optimal’  gains  may  sometimes  provide 
a  suitable  compromise,  but  again  may  not  be  satisfactory  for  any  of  these  cases. 
Changing  the  gains  ‘adaptively’,  or  in  accordance  with  the  specific  situation  in  Oder 
to  maintain  performance  in  various  operational  environments  is  an  attractive  idea. 
However,  when  we  use  non-stationary  control,  stability  is  not  necessarily  guaranteed 
in  general,  even  if  the  corresponding  linear  system  was  guaranteed  stable.  In  the  past 
it  has  been  shown  for  linear  systems,  that  SAC  guarantees  stability  with  non-linear 
adaptive  controllers  over  the  entire  region  that  could  have  been  used  by  fixed  gains 
one  at  a  time.  This  paper  extends  this  result  for  a  class  of  non-linear  systems  that  are 
linear  in  the  control.  While  it  does  not  attempt  to  present  a  general  solution  to  the 
general  problem,  this  approach  may  still  offer  a  simple  and  robust  adaptive  solution 
for  many  difficult  realistic  problems,  and  the  completion  of  preliminary  linear  designs 
based  on  only  vague  knowledge  about  the  plant  to  be  controlled. 

If  a  controlled  system  is  ASP,  then  simple  adaptive  control  procedures  find  the 
necessary  gains  that  can  stabilize  and  control  the  system  without  having  to  know  or 
use  this  fictitious  static  feedback  or  any  prior  knowledge  about  the  plant.  This  prop¬ 
erty  can  be  very  useful  if  one  attempts  to  present  a  solution  when  the  order  of  the 
plant  is  very  large  and  unknown,  because  in  this  case  very  low-order  adaptive  control¬ 
lers  may  control  even  large  flexible  structures  that  are  passive  under  idealistic  assump¬ 
tions,  or  can  be  made  passive  in  realistic  environments  (Bar-Kana  et  at.  1983,  Bar- 
Kana  and  Kaufman  1984,  Balas  et  al.  1984,  Ih  et  al.  1987). 

While  a  reader  with  the  ‘general  problem’  in  mind  may  still  wonder  about  the  basic 
assumption  of  the  prior  availability  of  some  stabilizing  configuration,  this  result  is  use¬ 
ful  in  particular  control  designs,  when  some  prior  knowledge  on  the  plant  to  be  con¬ 
trolled  is  sufficient  to  stabilize  the  plant,  even  though  not  necessarily  sufficient  to  obtain 
the  desired  performance.  (It  is  usually  enough  to  know  that  the  plant  is  a  motor,  a 
manipulator  or  a  flexible  structure.)  In  many  cases,  proper  controllers  may  not  necess¬ 
arily  be  desired,  like  the  PD  controllers  with  the  general  (matrix)  transfer  function 
H(s)  =  K(1  —  .S'50 )  (Kidd  1985,  Slotine  and  Li  1988,  Bar-Kana  et  al.  1989).  However, 
while  Kidd  ( 1985)does  not  use  this  desired  controller  because  of  the  need  for  derivatives, 
and  while  Slotine  and  Li  (1988)  try  to  cope  with  a  noisy  tachometer,  we  only  exploit  the 
assumed  existence  of  the  fictitious  improper  controller  and  actually  use  its  proper  in¬ 
verse  in  parallel  with  the  plant.  This  way,  although  we  only  use  position  measurement 
sensors,  we  still  get  the  effect  of  the  desired  differentiators  without  really  differentiating 
(Bar-Kana  1987  a.  1988  a,d).  Since  it  was  proven  that  global  stability  and  robustness  of 
the  non-linear  adaptive  controller  operating  on  the  augmented  plant  is  guaranteed,  the 
adaptive  controller  is  now  called  to  fit  the  right  gains  to  the  right  situation  in  order  to 
minimize  the  tracking  errors  (Bar-Kana  1987  b). 
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An  objective  evaluation  of  this  approach  as  compared  with  others,  although  still 
based  on  its  first  results,  can  be  found  in  the  paper  of  Mattem  and  Shoureshi  (1987). 
Meanwhile,  new  papers  and  reports  may  better  explain  some  aspects  that  were  not 
clear  enough  in  the  past.  Along  with  new  developments  in  the  theory  and  applications, 
attempts  are  made  to  write  some  explanation  that  would  intuitively  give  the  motiv¬ 
ation  behind  our  approach.  Section  4  of  Bar-Kana  (1987  a)  may  be  the  best  introduc¬ 
tion  to  this  approach.  It  is  mainly  concerned  with  SISO  systems.  One  may  also  note 
the  non-linear  manipulator  application  of  that  paper.  A  new  tutorial  (Bar-Kana 
1989  d)  which  may  be  the  most  complete  presentation  of  the  theory  and  applications 
so  far,  gives  the  necessary  explanation  and  motivation  for  the  various  steps  and  parts 
of  SAC  in  multivariable  systems,  using  the  minimal  amount  of  mathematics  possible. 

A  good  challenge  to  this  simple  adaptive  control  algorithm  in  difficult  practical 
applications  is  offered  by  the  adaptive  control  of  robots,  missiles,  or  planes.  Missiles 
(Guez  1980),  manipulators  (Guez  and  Dritsas  1987),  and  planes  (Morse  and  Ossman 
1989),  like  flexible  structures,  are  in  fact  stable,  or  easily  stabilizable,  configurations. 
However,  they  are  difficult  control  objects  because  stability  does  not  necessarily  imply 
performance.  In  various  operational  environments  the  control  parameters  (angles, 
loads,  inputs,  disturbances)  change  quite  a  bit  and  fixed  controllers  may  have  a 
difficult  task  trying  to  maintain  good  performance.  Here,  SAC  may  provide  a  solution: 
the  necessary  gains  are  computed  automatically  such  that  the  performance  (small 
tracking  errors)  is  maintained  because  the  adaptive  controller  may  use  the  entire 
domain  of  admissible  gains,  the  right  gain  at  the  right  time.  We  mention,  in  particular, 
flight  control  reconfiguration  after  multiple  control-surface  failure  (Morse  and  Oss¬ 
man).  The  adaptive  controllers  move  rapidly  from  some  value  of  the  adaptive  gains 
to  a  new  value  more  fitting  to  the  new  situation  after  failure,  thus  maintaining  a 
stable  flight  instead  of  what  could  have  been  total  catastrophe.  This  approach  is 
therefore  quite  different  from  other  mainstream  adaptive  control  methodologies  that 
try  to  use  adaptation  in  order  to  reach  some  fixed  ‘optimal’  controller.  SAC  may  be 
considered  a  first,  modest,  or  even  primitive  example  of  intelligent  control,  because 
it  only  uses  a  single  control  configuration,  yet  attempts  to  fit  the  right  controller 
gains  to  the  right  environmental  and  operational  condition  in  order  to  maintain  the 
required  performance  (represented  by  small  tracking  errors). 

However,  although  SAC  has  been  successfully  applied  in  non-stationary  and 
non-linear  systems,  proofs  of  stability  only  cover  linear  systems.  The  state-space 
representation  of  the  systems  and  the  Lyapunov  stability  analysis  of  the  adaptive 
algorithms  allowed  in  the  past  for  a  unified  presentation  of  SISO  and  multivariable 
systems  (Bar-Kana  1987  b,  1989  c),  the  the  first  attempts  at  non-linear  systems  show 
that  the  techniques  can  be  rigorously  extended  to  the  classes  of  non-linear  systems 
that  include,  for  example,  robots  and  missiles. 

This  paper  extends  these  concepts  to  non-linear  multivariable  systems.  The  robust 
model  reference  adaptive  control  algorithm  for  almost  passive  systems  is  presented 
in  §  2,  and  is  shown  to  be  based  on  the  basic  sabilizability  of  the  plants.  A  robot 
manipulator  application  is  then  presented  in  §3,  and  some  conclusions  in  §4. 


2.  Preliminaries  and  basic  definitions 

Let  us  consider  a  dynamic  non-linear  plant  given  in  the  following  state-space 
representation 


xp(t)  =  Ap(.tp).Vp(r)  +  Bp(.Xp)«p(t) 


(1) 


Simple  adaptive  control  for  non-linear  robotic  systems 


81 


y,(t) *  C,(x,)x,(t)  +  Dp(xp)up(t)  (2) 

where  xp(t)  e  R\  y,(t)  €  R",  up(t)  e  IR",  and  where  the  matrices  -4p(x„),  Bp(x„), 
Cp(xp),  Dp{xp)  are  uniformly  bounded.  We  use  the  not  necessarily  strictly  causal  form 
{A„{xp),  Bp(xp),  Cp(xp),  Dp(xp )}  only  for  the  sake  of  completion  of  the  treatment,  and 
the  following  results  remain  perfectly  valid  if  the  systems  are  strictly  causal  (Dp(xp)  = 
0). 

Definition  1 

The  system  (1),  (2)  is  said  to  be  almost  strictly  passive  (ASP)  if  there  exists  a  positive 
definite  static  feedback  matrix  K,  such  that  the  resulting  closed-loop  system  is  strictly 
passive  (SP)  (Fig.  1). 


Figure  1.  First  strictly  passive  configuration. 


Figure  1.  First  strictly  passive  configuration. 

It  is  easy  to  see  that  by  using  the  controller 

up(t)=  -K,ya(t) +  1*^(1)  (3) 

we  get  the  following  closed-loop  system  (Fig.  2) 

x„(t)  =  Atx{Xp)xp(t)  +  Bpc(-Xp)upc(f)  (4) 

y„(t)  =  Cp^XpJXpfr)  +  Dpc(xp)upc(t)  (5) 

where 

<4pc(*p)  =  Ap(Xp)  -  Bp(Xp)K„(Xp)Cp(Xp)  (6) 

K«(xp)  =  +  D„(xP)K'-}  ->=[/  +  KeDp(xr)]  ‘ 1  Ke  (7) 

Bpc(-Xp)  =  Sp(Xp)[/  +  /CfDp(Xp)]-1  (8) 

Cp,(Xp)  =  [/  +  Dp(xp)Kt  ]  ' 1  Cp(Xp)  (9) 

Dp .(x,)  =  [/  +  Dp(xp)Kt]  - 1  Dp(Xp)  =  Dp(Xp)[/  +  Kt  Dp(xp)]  ' 1  (10) 

where  Dp(xp)  >  0  and  Dp^Xp)  >  0,  which  means  that  £>p(xp)  and  Dpc(xp)  are  non¬ 
singular  and  positive  definite  for  all  xp  s  fR". 

The  strictly  passive  system  (4),  (5)  satisfies  the  following  relations  (Willems  1972): 

P(Xp)  +  P(Xp)/lpC(Xp)  +  /lJc(Xp)P(Xp)=  -Q(Xp)-  LT(Xp)L(Xp)  (11) 

P(xp)Bpc(xp)  =  Cl  (Xp)  -  LT(xp)  W(Xp)  ( 1 2) 

DpAXp)  +  Dl(xp)  =  Wr(xp)  W(xp)  ( 1 3) 
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Figure  2.  Second  equivalent  strictly  passive  system. 


for  some  uniformly  bounded  positive  definite  matrices,  P(xp )  and  Q(xA  and  some 
matrices  L(xp) e  R"*"  and  K'(Xp) and  where 

p/„  j  _  jgfrg)  _  cP(xp)  dx„ 

{'p>  dt  ~  cxp  dt 

It  is  easy  to  see  (Willems  1972)  that  relations  (U)-(13)  are  equivalent  to  the 
following  Riccati  equation: 

P(x,)  +  +  A;(Xp)P(xp)  +  [P(xp)Bpt(xp)  -  C;.(xp)] 

x  [DptfXp)  +  D^Xp)]  - 1  iBl(xp)P(xt)  -  Cp<(Xp)]  +  Q(xp)  =0  (14) 

In  strictly  causal  systems,  when  Dp(xp)  =  0,  the  strict  passivity  relations  are: 

P(Xp)  +  P(Xp)[/tp(Xp)  -  Bp(Xp)K,Cp(Xp)] 

+  C^p(-Xp) -  Bp(Xp)X,Cp(Xp)]TP(Xp) .  -Q(xp)  <0  (15) . 

P(Xp)Bp(Xp)  =  Cj(Xp)  (16) 

Relations  (15),  (16)  can  be  obtained  directly  from  (14)  which,  when  Dp(xp)  is  singu¬ 
lar,  cannot  hold  unless  (16)  holds,  and  then  (15)  follows  immediately.  Let  us  also 
assume  that  Bp(xp)  is  of  maximal  rank.  In  this  case.  (16)  requires  that  Cp(xp)  also  be 
of  maximal  rank  and  that 

Bpr(.Xp)Cj(Xp)  =  Bj(Xp)P(Xp)flp(Xp)  >0  (17) 

which  is  the  equivalent  of  the  relative  conditions  of  linear  time-invariant  systems, 
which  requires  that  strictly  causal  ASP  systems  have  n-m  minimum-phase  finite 
zeros  and  n  arbitrary  poles  (Shaked  1977,  Bar-Kana  1987  a). 

The  reader  may  find  different  definitions  of  passivity  in  the  literature,  but  the 
forms  above  were  selected  because  they  are  direct  results  of  the  basic  stabilizability 
properties  of  the  plant  (Bar-Kana  1989  a),  and  also  because  they  are  very  convenient 
in  the  subsequent  proofs  of  stability.  ASP  systems  are  also  high  gain  stabilizable  (Bar- 
Kana  1989  a),  a  property  that  is  essential  for  the  robustness  of  the  non-linear  adaptive 
controllers.  We  base  our  development  on  exponential  stability  as  defined  by  Hahn 
(1967)  with  application  to  our  specific  structures.  We  assume  that  if  the  plant  is 
stabilizable.  the  resulting  stable  configurations  is  exponentially  stable  as  defined 
below. 


Definition  2  (Hahn  1967) 

Let  the  general  non-linear  system  be  represented  by  the  n-order  vectorial  equation 
•x(0=/(.x,  0  and  let  x  =  0  be  an  equilibrium  point.  The  equilibrium  point  is  called 
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exponentially  stable  if  all  solutions  x(t)  satisfy  the  relation  |x(t)|  <  a|x(0)|  exp  (-  fit)  for 
some  scalars  a  >  0  and  /?  >  0. 

Theorem 1  (Hahn  1967) 

Let  the  right-hand  side  of  x(t)  =  fix ,  f)  have  bounded  continuous  first-order  partial 
derivatives.  Let  the  equilibrium  be  exponentially  stable.  Then  there  exists  a  Lyapunov 
function  F(x,  t)  that  satisfies  estimates  of  the  form 

(a)  a,|x(t)|2^K(x,t)$a2|x(t)!J 

(b)  K(x,tKct3|x(£)|J 

(c)  "-  agat4lx(t)|2,  /  *=  1, 2, ....  n 

Because  this  paper  deals  with  non-linear  systems  of  the  form  ( 1 ),  (2)  rather  than 
linear  time-invariant  systems,  we  cannot  expect  to  find,  or  even  show  the  existence 
of  a  positive  definite  quadratic  Lyapunov  function  of  the  form  F(xp)  =  xj(t)Pxp(t) 
where  P  is  constant  and  positive  definite.  However,  after  some  experience  with  specific 
non-linear  systems,  such  as  robots,  and  because  we  restrict  our  discussion  to  non¬ 
linear  systems  that  are  linear  in  the  control  of  the  form  (1),(2),  we  assume  that 
exponential  stability  of  the  autonomous  system  (1),  with  up(t)  =  0,  implies  the  existence 
of  non-linear  Lyapunov  functions  K(x)  which  are  not  explicit  functions  of  time. 
Furthermore,  since  we  can  always  write  K(xp)  =  xJ(f)P(xp)xp(t),  and  because  then 
F(xP)  =  xJ(r)[P(xp)  +  P(xp)A(xp)  +  AT(xp)P(xp)]xp(t)  =  -xJ(t)Q(xp)xp(t\  we  re¬ 
strict  the  subsequent  discussion  to  system  that  satisfy  the  following  assumption. 

Assumption 

Exponential  stability  of  the  autonomous  system  (1),  with  up(t)  =  0,  implies  the 
existence  of  Lyapunov  functions  of  the  form  F(xp)  =  xJ(t)P(xp)xp(t)  and  a  derivative 
of  the  form  F(xp)  =  xJ(t)Q(xp)xp(t),  where  P(xp)  and  Q(xp)  arc  positive  definite  for 
all  xp  e  (R\  By  using  the  same  Lyapunov  function  with  the  non-autonomous  equation 
(1),  it  is  easy  to  see  that  exponential  stability  implies  BIBO  stability. 

Lemma:  Almost  passivity 

Let  Gp  be  any  non-linear  system  of  the  form  (15),  (16).  Let  H  be  some  stabilizing 
linear  controller  for  Gp,  and  assume  that  we  use  the  inverse  of  H  in  parallel  with  Gp, 
so  that  the  augmented  system  Ga  =  Gp  +  H~l  has  the  form  G„  =  Mp(xp),  Bp(xp), 
Cp(xp),  Dp(xp)}.  Then  Ga  is  ‘almost  strictly  passive’  (ASP).  In  other  words,  the  aug¬ 
mented  system  G„  satisfies  almost  strict  passivity  relations  of  the  form  (1 1)— (13)  (Bar- 
Kana  1989  a). 

The  most  attractive  and  direct  applications  of  the  almost  passivity  lemma  can  be 
implemented  if  some  raw  prior  information  about  the  controlled  plant  is  given  and 
some  estimate  of  the  maximal  stabilizing  output  feedback,  Kmtl,  is  known.  If  it  can 
be  evaluated,  then  by  using  the  parallel  feedforward  Dp  =  K„pt,  we  obtain  an  aug¬ 
mented  system  that  can  easily  and  reliably  be  controlled  by  almost  any  reasonable 
control  method.  Since  instead  of  controlling  the  plant  output  yp(r)  =  Cp(xp)xp(r)  we 
now  control  the  augmented  output  ya(t)  =  yp(t)  +  Dpup(t),  the  lemma  is  especially 
useful  in  systems  that  maintain  stability  with  high  feedback  gains.  In  this  case,  the 
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supplementary  gain  may  be  very  small  and  may  not  essentially  affect  the  controlled 
plant  output.  For  example,  in  very  common  cases,  if  the  gain  of  the  plant  is  ten  and 
the  largest  admissible  gain  is  evaluated  to  be  about  100,  then  we  only  add  </100  = 
0  01  in  parallel  with  the  plant.  It  is  amazing  how  such  a  small  parallel  term  changes 
the  stability  properties  of  the  plant  (Bar-Kana  and  Kaufman  1987). 

To  end  these  remarks,  we  note  that  Kmtt  (or  H  in  general),  is  indeed  required  to 
guarantee  the  stability  of  the  fictitious  closed-loop  system,  and  thus  the  almost  passiv¬ 
ity  of  the  augmented  open-loop  system  (15),  (16),  but  only  in  a  very  weak  sense.  The 
fictitious  stable  system  is  by  no  means  required  to  be  good.  Thus,  stability  is  only  a 
sufficient  condition  that  enables  the  subsequent  non-linear  control  to  impose  the 
desired  behaviour  of,  for  example,  a  well  designed  low-order  reference  model  upon 
the  (large)  plant. 

The  proof  of  stability  of  the  subsequent  multivariable  adaptive  control  systems 
extends  the  applicability  of  simple  adaptive  controllers  to  non-linear  systems,  and 
some  references  at  the  end  of  this  paper  may  give  a  good  illustration  about  what 
can  be  done  in  stationary,  non-stationary,  or  non-linear  systems  when  appropriate 
feedforward  can  be  selected  a  priori  (Bar-Kana  and  Kaufman  1983,  1984,  1985  a,  b, 
1988,  Bar-Kana  et  al.  1983,  Bar-Kana  1987  a,  b)  or  if  it  must  also  be  computed 
adaptively  (Bar-Kana  1986  b,  1987  b). 


3.  Model  following  problem  for  non-linear  ASP  systems 

In  §  4  we  show  how  to  select  the  necessary  parallel  configurations  in  practical 
design.  This  section  assumes  that  the  adaptive  control  is  applied  to  the  augmented 
ASP  system.  In  realistic  environments  we  must  take  into  consideration  the  input  and 
output  disturbances,  and  therefore  our  augmented  ASP  controlled  plant  has  the 
following  representation 

*p(0  =  Ap(xp)xp(t)  +  Bp(xp)Up(t)  +  dt(Xp,  t)  (18) 

yfl(f)  =  Cp(Xp)xp(t)  +  Dp(Xp)Up(t)  +  d0(Xp,  t)  (19) 

where  the  matrices  Ap(xp),  Bp(xp),  Cp(xp),  and  Dp(xp)  are  uniformly  bounded.  We 
assume  that  d,(xp,  t)  and  d0(xp,  t)  are  bounded  input  and  output  disturbances  that 
can  also  represent  some  inaccuracies  of  the  representation  of  the  real  plant.  The  plant 
is  assumed  to  be  ASP;  however,  it  may  also  be  very  large  and  basically  unknown. 
The  output  of  the  plant  is  required  to  follow  the  output  of  the  (possibly)  very  low- 
order  model 


■*m(0  =  /Uxm)xm(0  +  Bm(xm)um{t)  (20) 

yjt)  =  Cm(.xm)xm(t)  +  Dm(xm)um{t)  (21) 

The  model  represents  the  desired  behaviour  of  the  plant,  but  is  otherwise  free  and 
does  not  have  to  be  the  result  of  some  prior  modelling  of  the  plant.  It  is  also  allowed 
to  be  of  any  arbitrarily  high  or  low  order.  It  will  usually  be  a  linear  time-invariant 
model,  and  we  represent  it  in  the  general,  non-linear  form  only  for  the  sake  of  the 
generality  of  the  solution. 

We  now  define  the  output  tracking  error 

ey(t)  =  ym(t)-yp(t)  (22) 

and  use  the  foilowing  simple  multivariable  adaptive  control  algorithm  (Fig.  3) 
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(Bar-Kana  and  Kaufman  1983,  1985  b,  Bar-Kana  1987  a) 

uP(t)  —  Kty(t)ey(t)  +  KXm{t)xm(t)  +  KUm(t)um(t)  =  K(t)r(t)  '  (23) 

K(t)  =  lKty(t)  KxJt)  KuJt)]  (24) 

rT(t)  =  [eJ(t)  xTm(t)  uKr)]  (25) 

where  the  adaptive  gains  are  computed  as  a  combination  of  'proportional’  and 
‘integral’  gains: 

KpM  =  t>>>(i)eJ(£)r,y  t?y(f)u^(r) ru>ri]  =  ey(r)rT(f)r  (26) 

K,(t)  =  ey(t)rT(t)r-oK,(t)  (27) 

K(t)  =  Kp(t)  +  K,(t)  (28) 

where  T  and  T  are  selected  positive  definite  scaling  matrices. 


Figure  3.  Complete  adaptive  control  system. 


The  basic  algorithm  (23)-(28),  without  the  o-term  in  (27),  was  introduced  by  Sobel 
et  al.  (1979,  1982)  and  extended  by  Bar-Kana  anc-.  Kaufman  (1983,  1984,  1985  a).  The 
gain  K,y(t)  takes  care  of  the  stability  of  the  controlled  plant,  while  KXin(t)  and 
KuJt )  help  improve  the  performance  of  the  adaptive  model-following  system  and 
even  achieve  perfect  following  in  ideal  clean  environments  (Bar-Kana  and  Kaufman 
1985  a). 

The  ff-term  is  a  ‘forgetting’  factor  (Ioannou  and  Kokotovic  1983)  that  avoids 
divergence  of  the  integral  gain  in  the  presence  of  disturbances.  Since  it  is  only  a  low- 
pass  filtering  of  e>,(r)rT(r)r,  this  gain  will  not  diverge  unless  the  output  tracking  error 
diverges.  Furthermore,  the  adaptive  gain  now  moves  up  and  down  according  to  the 
specific  situation  (errors),  as  adaptive  gains  should  do.  The  proportional  gain  Kp(t) 
is  added  for  its  immediate  action.  Owing  to  this  action,  it  heavily  punishes  high 
tracking  errors  and  quickly  brings  the  system  towards  tracking  with  small  errors 
(l.andau  1979). 

The  efficiency  of  the  integral  term  K^(f)  was  also  recently  demonstrated 
(Nuv>baum  1983,  Willems  and  Byrnes  1984,  Morse  1984.  Bar-Kana  1989  d)  with 
respect  to  the  problem  of  the  unknown  sign  of  the  high  frequency  gain. 

The  subsequent  theorem  of  robust  adaptive  stability  extends  the  applicability  of 
this  simple  adaptive  algorithm  to  ‘almost  passive’  multivariable  nor.-linear  systems, 
and  gives  the  necessary  theoretical  background  for  the  successful  applications  of  the 
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algorithm  in  non-linear  or  non-stationary  systems  (Bar-Kana  1987  a,  b,  1988  c,  Bar- 
Kana  and  Kaufman  1988). 

Theorem  2 

The  model  reference  adaptive  controller  (23)— (28)  guarantees  robust  adaptive 
stabilization  of  the  ASP  plant  ( 18)— (19)  in  the  presence  of  any  bounded  input  com¬ 
mands  and  input  or  output  disturbances. 

Proof 

For  the  proof,  see  Appendix  A  for  causal  and  Appendix  B  for  strictly  causal  ASP 
systems.  . 

‘Robust  adaptive  stabilization’  means  that  all  values  involved  in  the  adaptation 
process,  namely  states,  gains  and  errors,  are  bounded  in  the  presence  of  any  bounded 
input  commands  and  input  or  output  disturbances.  It  should  be  mentioned  that  this 
is  a  very  general  result  because  we  do  not  assume  any  prior  knowledge  on  the  plant, 
and  because  a  perfect  tracking  solution  may  not  exist  since  the  model  reference 
represents  only  some  desired  behaviour  of  an  ideal  plant.  However,  because  stability 
of  the  non-linear  adaptive  controller  with  ASP  plants  is  guaranteed,  the  non-linear 
controller  gains  are  called  to  make  the  tracking  errors  arbitrarily  small,  as  a  trade¬ 
off  between  large  gains  and  large  errors.  Asymptotic  perfect  following  or  asymptotic 
perfect  regulation  are  obtained  in  idealistic  conditions.  Note  that  we  do  not  need  the 
usual  assumptions  that  the  dynamics  (Seraji  1989)  or  the  adaptation  (Ortega  and 
Spong  1988)  are  slow  in  order  to  guarantee  stability  of  this  adaptive  control  system. 

Although  Theorem  2  guarantees  the  boundness  of  all  dynamic  values  involved  in 
the  adaptation  process,  some  undesired  phenomena  were  observed  in  the  adaptive 
algorithms  with  forgetting  factors  (Astrom  1983,  Anderson  1985,  Bar-Kana  1988  b, 
1989  c,  Fortesque  et  al.  1981,  Hsu  and  Costa  1987)  when  no  external  excitation  is 
present.  In  our  configuration,  these  effects  are  apparent  in  particular  if  the  controlled 
plant  is  unstable  (Hsu  and  Costa  1987,  Bar-Kana  1988  b).  In  this  case,  the  adaptive 
gain  may  initially  increase  due  to  the  initial  errors,  and  reach  some  stabilizing  value. 
As  a  result,  the  states  and  outputs  move  towards  their  zero  value,  and  a  decrease  of 
the  output  leads  to  a  decrease  of  the  gains  towards  their  zero  value.  However,  since 
the  equilibrium  point  ( y0(r)  =  0,  K(r)  =  0)  is  unstable,  and  since  on  the  other  hand  all 
values  are  bounded,  we  can  expect  that  the  system  will  reach  some  other  limiting 
trajectories  or  equilibrium  points  (Willems  1971). 

Since  the  gain  initially  has  a  stabilizing  effect,  the  states  and  outputs  come  very 
close  to  their  zero  value.  Therefore,  the  gains  must  go  well  into  the  unstable  region 
before  their  destabilizing  effect  is  felt.  We  therefore  have  a  sudden  ‘burst’  of  error  that 
brings  the  gains  into  the  stable  region  again,  and  so  on  (Mareels  and  Bitmead  1986, 
Bar-Kana  1988  b,  1989  c).  If  we  use  fast  adaptation,  which  in  our  case  means  using 
high  adaptation  coefficients  T  and  T,  the  adaptive  gains  increase  immediately  to  such 
values  that  the  tracking  errors  can  be  reduced  to  less  than  noticeable  dimensions 
(Bar-Kana  1988  b).  However,  when  the  phenomenon  appears,  it  finally  settles  at,  or 
about,  the  minimum  stabilizing  gain  Km!n.  As  shown  below,  we  can  use  this  value  to 
totally  eliminate  the  bursts. 

If  we  replace  Kty(t)  in  (24)  by 

K,y(f)  +  K0 


(29) 
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where  K0  is  any  stabilizing  gain 

Ko>Km{n  .  (30) 

then  the  bursting  phenomena  are  eliminated,  because  if  either  ||ya(r)||,  ||K(t)||  or 
||K(t)ya(t)||  become  small,  the  adaptive  system  enters  the  domain  of  attraction  of  the 
equilibrium  point  (y4(f)  =  0,  K[t)  =  K0)  (Appendix  C).  Since  it  can  be  shown  that  the 
equilibrium  point  (xp(l)  =  0,  K(t)  =  K0)  is  both  asymptotically  stable  (attractive)  and 
unique,  the  eventual  existence  of  other  equilibrium  points  is  excluded  (Appendix  D). 

In  conclusion,  although  in  general  prior  knowledge  of  the  bounds  of  stability, 
Kmin-Kmtl,  which  is  admissible  with  fixed  controllers,  does  not  guarantee  stability 
with  non-linear  controllers  (Aizerman  and  Gantmacher  1964),  stability  is  guaranteed 
in  the  specific  case  of  this  particular  non-linear  adaptive  algorithm. 


4.  Adaptive  control  of  manipulators 

A  state  space  representation  of  the  manipulator  is  (Slotine  and  Li  1988): 

xp(t)  =  <4p(Xp)xp(t)  +  Bp(Xp)Up(t)  +  d(Xp) 


'xi(t)' 

=  *p(£)  = 

*2(0 


yP(  0  = 


where  xp  e  CR2",  yp  e  R2",  up  e  R",  and  where  x,(t)  is  the  n  position-state  vector  and 
x2(f)  is  the  n  velocity-state  vector.  The  various  matrices  of  corresponding  dimensions 
in  (31),  (32)  are: 


0  I 

p(xp)  = 

1_0  -H~l(xl)C(xux1)j 


Sp(Xp)  = 


H_,(x,) 


0 

L-jr‘(xi)*(xi)_ 


d(xp)  = 


where  d(xp)  is  considered  to  be  a  state-dependent  disturbance  generated  by  the  gravity 
g(*i)- 

We  first  neglect  the  gravity  term  in  order  to  establish  some  basic  relations  and 
show  that  the  ideal  system  can  be  stabilized  by  a  feedback  controller  of  the  form 

mp(0=  “K(*i  +*x2)  (36) 

for  any  positive  definite  matrix  K  and  positive  coefficient  a.  In  other  words,  the 
closed-loop  system 


xM  =  ACt(Xp)Xp(t) 


where 


0  / 

Ar/,(X„)  — 

L-H-‘(x,)K  -H-'(Xl)[C(Xl, 


x2)  +  *K] 
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is  uniformly  asymptotically  stable.  To  this  end  we  select  the  positive  definite 
Lyapunov  function 

V(xp)  =  xT,(t)P(x,)x,(t)  (39) 

where 


P(x,)  = 


~K 

0 


0 

H(x  i) 


(40) 


and  therefore,  the  derivative  of  the  Lyapunov  function  along  the  trajectories  of 
(37)  is 


V{xP)  =  xJ(0[P(xp)  +  P(xp)Acl(xp)  +  /tfL(xp)P(xp)]xp(r) 
0  K-KT 


=  xl(t) 


Kr-K  H(Xl)-C{xux2)-CT(xux2)-ot(K  +  Kr) 


xP(t)  (41) 


and  where  we  use  (Slotine  and  Li  1988)  the  manipulator  property  that  H(x2)  is 
positive  definite  and  that  H(xi)  =  C(xux2)  +  CT(x1(  x2),  to  obtain  finally 

K(xp)=-2«xl  (t)Kx2(t)  (42) 

Although  ^(xp)  is  only  negative  semidefinite  rather  than  negative  definite,  uniform 
asymptotic  stability  of  (37)  is  guaranteed  by  the  following  theorems,  which  we  formu¬ 
late  in  a  form  that  fits  our  specific  problem. 


Theorem  3  (Lyapunov:  see  Vidyasagar  1978) 

Let  K(xp)  be  a  positive  definite  Lyapunov  function  of  xp(t).  Then,  the  system  (37) 
is  globally  asymptotically  stable  if  and  only  if  the  derivative  of  the  Lyapunov  function 
along  the  trajectories  of  (37)  is  a  negative  definite  function  of  xp(t)  (Vidyasagar  1978). 

Theorem  4  (LaSalle  1981) 

Under  the  assumptions  of  Theorem  3,  the  system  (37)  is  stable  if  the  derivative  of 
the  Lyapunov  function  along  its  trajectories  is  negative  semidefinite.  The  trajectories 
of  the  system  finally  reach  the  region  defined  by  ^(xp)  =  0  (LaSalle  1981).  The  system 
(37)  is  still  asymptotically  stable  if  the  negative  semidefinite  derivative  V(xp)  is  not 
identically  zero  along  any  non-trivial  solution  of  (37). 

In  our  case,  since  V(xp)  is  quadratic,  the  limiting  trajectories  condition  K(xp)  =  0 
is  equivalent  to  x2(t)  =  x,(t)  =  0,  which  implies  x t  (r)  =  0,  and  therefore  xp(f)  =  0. 

Although  stabilizability  of  the  robotic  manipulator  through  some  PD  controller 
was  established  using  a  negative  semidefinite  derivative  of  the  Lyapunov  function, 
the  passivity  relations  needed  for  the  convergence  of  the  non-linear  adaptive  control¬ 
ler  require  that  the  underlying  K(xp)  be  negative  definite,  or  have  a  negative  definite 
upper  bound  with  respect  to  xp(r).  Now,  since  for  any  x2(r)  #  0  we  have  P(xp)>0 
(strictly  positive)  and  ^(xp)  <  0  (strictly  negative),  which  implies  that  P(xp)  is  strictly 
decreasing,  then  ^(XpJ/K^x,,)^  -/?,  for  some  scalar  /?>0,  sufficiently  small.  Then 
P(xpK  -PV(xp)  and 

^(xp)=  -2axJ(r)Kx2(rK  -/fcJ(r)P(Jc,)xp(0=  -PV(xp) 


(43) 
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and  therefore  all  passivity  lemmas  and  the  proofs  of  the  robust  stability  of  the  adaptive 
systems  are  valid  in  the  case  of  robot  manipulators. 

The  stabilizability  of  a  robot  via  PD  controllers  is  only  needed  as  underlying 
knowledge.  The  implementation  of  the  simplified  adaptive  controller  does  not  actu¬ 
ally  make  use  of  the  derivative  of  the  output  and  we  can  only  use  position  sensors. 
Instead  of  the  fictitious  stabilizing  PD  controller  H{s)  given  by  (36)  we  use  the 
configuration  H~l(s)  given  by 


yAs)=fh^(s)  (44) 

in  parallel  with  the  controlled  plant  (Bar-Kana  1987  a),  where  Kmit  is  some  reasonable 
evaluation  of  the  maximal  admissible  gain  K  in  (36),  which  theoretically  may  be 
arbitrarily  large.  Since  we  treat  the  gravity  term  as  some  unknown  and  unmeasured 
bounded  disturbance,  and  since  we  do  not  use  any  identification  algorithm,  the 
adaptive  controller  can  only  guarantee  stability  with  respect  to  the  boundness  of  all 
states,  gains,  and  tracking  errors,  in  general.  However,  since  the  adaptive  algorithm 
moves  the  gains  up  and  down  in  order  to  reduce  the  errors,  this  residual  error  may 
be  negligible  (Bar-Kana  1987  a,  Laniado  et  al.  1989).  The  ideal  perfect  tracking  is 
thus  given  up  to  obviate  the  need  for  more  complex  algorithms  using  identifiers  in 
the  closed-loop. 


5.  Conclusions 

In  this  paper  the  basic  stabilizability  properties  of  systems  have  been  used  to 
achieve  'almost  passive’  configurations  for  a  class  of  non-linear  continuous-time  sys¬ 
tems.  It  has  also  been  shown  that  the  ‘almost  passivity’  property  of  systems  can  be 
used  to  extend  the  applicability  of  simple  non-linear  adaptive  algorithms  to  non¬ 
linear  systems,  since  it  guarantees  the  global  stability  and  robustness  of  the  adaptive 
control  systems  in  realistic  environments.  Other  prior  knowledge  and  conditions, 
usually  needed,  such  as  the  order  of  the  plant,  the  relative  degree,  inverse  stability, 
stationarity,  external  excitation,  etc.  are  immaterial  in  this  context.  The  paper  suggests 
a  simple  and  robust  adaptive  controller  for  non-linear  systems  with  unknown  par¬ 
ameters,  with  particular  application  for  robot  manipulators. 
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Appendix  A 

Proof  of  stability  of  the  adaptive  control  algorithm  (23)-(28) 

The  adaptive  controller  is  desired  to  bring  the  control  system  to  that  ideal  situ¬ 
ation  where  perfect  output  tracking  is  performed.  Therefore,  we  shall  first  question 
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the  possible  existence  of  some  ‘ideal’  bounded  trajectories  xj(t)  and  ‘ideal’  control 
xj(f)  that  satisfy  some  unknown  differential  equation 

x*(t)  =  A*(x*p)x*(t)  +  B*(x*)u*(t)  (A  1) 

tf(t)  =  Cp(xp)x*(t)  +  Dp(xp)u*(t)  (A  2) 

where  the  matrices  ,4*(xJ),  BJ(xJ),  Cp(xp),  and  Dp(xp)  are  uniformly  bounded.  Note 
that  the  ideal  system  (A  1)  may  be  entirely  different  from  the  plant  to  be  controlled, 
and  that  only  their  output  equations  (A  2)  are  assumed  to  be  equal. 

We  now  try  to  represent  xj(t)  and  u*(t)  as  linear  combinations  (Bar-Kana  and 
Kaufman  1985  a)  of  xM(t)  and  um(t) 

x*(t)  =  Xxn(t)  +  Uum(t)  (A3) 

«?(*)  =  £*mx«.(t)  +  £«„um(t)  (A  4) 

If  the  plant  states  reach  the  ideal  trajectories,  xp(t)  =  x*(t)  we  require  that 

yP(t)  =  Cp(x*)x*(r)  +  Dp(x*)u*(t)  =  ym(t)  (A  5) 

Substituting  (A  2)-(A  4)  and  (22)  into  (A  5)  gives 

Cp(x*p)Xxm(t)  +  Cp(x*)Uum(t)  +  Dp(x*)RXmxm(t)  +  Dp(x*pKmum(t) 

=  Cm(xm)xm(t)  +  DJxm)um(t)  (A  6) 


or 


[C,(x*)X  +  Dp(x*)KXn  -  Cm(xm)]xm(t) 

+  [Cp(x*)U  +  DP(x*)ftUm  -  Dm(xJ]  um(t)  =  0  (A  7) 

It  is  clear,  of  course,  that  if  the  meauring  matrices,  Cp,  Cm ,  Dp,  and  Dm  are  all 
constant,  (A  7)  usually  has  a  solution  (at  least)  for  the  unknown  matrices  X,  U , 
KIm,  and  KUm,  because  it  has  many  more  variables  than  equations.  In  the  general 
non-linear  case,  we  may  think  that  these  unknown  matrices  are  functions  of  xj.  In 
reality,  xj  is  any  bounded  trajectory  that  may  satisfy  (A  7)  and  is  free  otherwise. 
Therefore,  we  can  assume  that  (A  7)  is  satisfied,  in  general. 

Notice  that  even  if  (A  7)  assumes  that  the  ideal  trajectories  exist,  perfect  tracking 
by  the  real  plant  may  not  be  possible,  in  the  most  general  case,  when  the  model  and 
the  plant  are  totally  different.  We  leave  this  result  in  its  most  general  form,  to  show 
that  even  in  this  case,  when  perfect  tracking  is  not  possible,  the  tracking  enors  can 
be  reduced  arbitrarily  as  a  trade-off  between  large  adaptive  gains  and  large  errors. 
Notice  also  that  none  of  the  solutions  that  were  discussed  above  are  needed  for 
implementation  of  the  adaptive  algorithm  (23)— (28).  The  conditions  of  existence  that 
were  established  for  the  idealistic  perfect  following  solution  will  subsequently  be  used 
for  the  proof  of  stability. 

Since  we  want  plant  states  to  reach  the  ‘ideal’  states  that  allow  perfect  tracking, 
we  define  the  state  error  as 


(A  8) 


exW  =  x*(t)-.xp(t) 
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and  the  output  error  as 

ey(t) = -  yP(0  =  y*(0  -  C  cp(x*)  -  cp(xp)]x*(r) 

-  [Dp(x*)  -  Dp(xp)]u*(t)  - yp{t)  (A  9) 

ey(t)  =  Cp(xp)xp(t)  +  Dp(xp)u*{t)  -  Cp(xp)xp(t)  -  Dp{xp)up(t) 

-  [Cp(x*) -  Cp(xp)]x»(r) - d0(t)  (A  10) 

ey(t)  =  Cp(xp)ex(t)  +  Dp(xp)lRXmxm(t)  +  £„„um(t)] 

+  Dp{xp)Rtye,(t)  -  Dp(xp)fZ,yey(t) 

-  Dp{xp)K(t)r(t)  -  [Cp(xJ)  -  Cp(xp)]x*(t) 

-LDp(x*)-Dp(xp)}u*(t)-d0(t)  (All) 

ey(t)  =  Cp(xp)ex(t)  -  D„(xp)[K(t)  -  ft}r(t)  -  Dp(xp)fi,yey(t) 

-  [Cp(x*)  -  Cp(xp)]x*(t)  -  [Dp(x*)  -  £>p(xp)]up(r)  -  d0(t)  (A  12) 

Denote 

^iW  =  [Cp(x*)-Cp(xp)]x*(t)  +  [Dp(x;)-Dp(xp)]«*(t)  +  do(0  (A  13) 

Then 

[/  +  Dp(xp)iZty-]ey(t)  =  Cp(xp)ex(t) -  Dp(xp)[K(t ) -  tf] r(t )  -  d,(t)  (A  14) 


and  finally 

ey(t)  =  C^eAt)  -  Dpc(xp)[K(t)  -  K]r(t)  -  [/  +  Dp(xp)  *.,]-»  </,(»)  (A  15) 
where 

R  =  lKty  KXm  KUJ  (A  16) 

and  where  Cpe(xp)  and  Dpc(xp)  were  defined  in  (9)— (10). 

Differentiating  ex(t)  in  (A  8)  gives 

<U0  =  x*(t)  ~  xp(t)  =  x*(t)  -  Ap(xp)x*(t)  +  Ap(xp)x*(t)  -  xp(t)  (A  17) 

ex(t)  =  A*(x*)x*(t)  +  Bp(xp)up(t)  -  Ap(xp)x*(r) 

+  /4p(x,)x*(t)  -  Ap{xp)xp{t)  -  Bp(x„)Kmt)  -  d,(t)  (A  18) 

<U0  =  <4p(-xp)e,(f)  -  Bp(xp)K(t)r(t)  +  Bp(xp)KXmxm(t)  +  Bp(xp)KUnum(t) 

-  Bp(xp)Ktyey(t)  +  Bp(xp)iZtyey(t)  +  [-4*(x*)  -  4p(xp)]Xxm(r) 

+  [A*p{x*p)-  Ap(xp)]  Uum(t)  -  d,(t )  (A  19) 

ex[t)  =  Ap(xp)ex(t)  -  Bp(xp)Ktyey(t)  -  B„(xp)[K(t)  -  K]r(t) 

+  {[A*(x*)~  Ap(xp)]X  +  [B*(x*)~  Bp(xp)-]KXm}xn(t) 

+  I tA*p(x>)  -  4p(xp)]  U  +  [BJ(xJ)  -  Bp{xp)]RuJum(t)  -  d,(t) 


(A  20) 
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Substituting  ey(t)  from  (A  15)  gives 

e,(t)  =  -  *,(*,)*.,[/  +  Dp(xp)Rty]  ~ 1  Cp(xp) 

-  Bp(xp)*,y[/  +  Dp(xp)HtyrlDp(xp)[K(t)  ~  £]r(f) 

-  Bp(xp)ZtyU  +  Dp(xp)Rtyyld0(t)  -  Bp(xp)[K(t)  -  Z]r(t) 

+  (|\4p*<x*)  -  Ap(xp)-]X  +  [fl*(x*)  -  BP(xp)]£,JxM(t) 

+  {[A*(x*)  -  Ap(xp)] U  +  [B*(x*) -  Bp(xp)] Xyjujt)  - d,(t)  (A  21) 

Since 

-  Bp(xp)£ey[/  +  Dp(xp)J<!ey]  - 1  Dp(xp)  +  Bp(xp) 

=  Bp(xp){l  +  £,y[/  +  Dp(x,)R.,yl  Dp(xp)} 

(using  the  matrix  inversion  lemma) 

=  Bp(xp)  [/  +  Dp(xp)£<y]  - 1  =  B„(xp)  (A  22) 

we  get 

ex(t)  =  A„{x,)e,(t)  -  B^x.mt)  -  £]r(t)  +  F(t)  (A  23) 

where  A^x,,),  flpjx,,),  Ktc(x„)  were  defined  in  (4)— (10)  and  where  F(t)  is  the  residual 
uniformly  bounded  term 

F(t)  =  \lA*p(x'p)  -  Ap(xp)-]X  +  [B*(x*)  -  Bp(xp)-}RxJxm(t) 

+  {[Ap*(x*)  -  Ap(xp)]  V  +  [BJ(xJ)  -  Bp(xp)-]R„Jum(t) 

—  Bp{xp)Rtedi(t)  —  dt(t)  (A  24). 

The  following  quadratic  Lyapunov  function  will  be  used  for  the  proof  of  stability 
of  the  adaptive  system  (A  22)  and  (27): 

K(t)  =  el(t)P(xp)ex(t)  +  tr  -  K]  r~ 1  [K,(r)  -  £]T)  (A  25) 

where  P(xp)  is  the  positive  definite  matrix  defined  in  (1 1)-(13)  and  T  is  the  positive 
definite  scaling  factor  defined  in  (27).  The  derivative  of  K(t)  ‘along  the  trajectories’  of 
(A  22)  and  (27)  gives: 

Ht)  =  eTx(t)P(xp)ex(t)  +  eTx(t)P(xp)ex(t)  +  erx(t)P(xp)ex(t) 

-2tr{[K/(f)-^]r-‘/C7(t)}  (A  26) 

Substituting  ex(t )  from  (A  23)  and  K,(t)  from  (27)  gives 
I>(t)  =  eI(f)P(xp)e,(t)  +  eJ(r)Ajc(xp)P(xp)eI(t) 

-  rT(t)[K(f)  -  K]TBjc(xp)P(t)ex(t)  +  FT(t)P(t)ex(t) 

+  eJWPfXpM^XpKW  -  eI(t)P(^P)Bpc(.xp)CK(t)  -  £]r(r) 

-  eTJt )  P(xp)F(f)  +  2  tr  {[K,(t)  -  K]  f 1  [ey(t)rT(t)  T  -  aK,( t)]}}  (A  27) 

^(0  =  eI(t)[P(.xp)  +  P(xp)Ape{x„)  +  A  ^(XpJPfx,,)]^) 

-  el(t )P(xp)Bpc(xp){K(t)  -  K]r(f)  -  rT(0[K(f)  -  X]TBjf(xp)P(f)cx(r) 

+  2t?I(f)P(xp)F(t) 

-  fJ(r)CFC,(f)  -  £]r(t)  +  rT(t)[K,(t)  -  K]Tey(t) 

-2fftr![K,(f)-£]r-lKl(r)}  (A  28) 
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Substituting 

K,(t)  =  K(t)-Kp(t)  =  K(t)-ey(t)rT{t)r  (A  29) 


V(t)  =  eI(t)[P(xp)  +  P(x,M„(xp)  +  ^(xp)P(xp)]<?x(r) 

-  eKOP^B^x,,)!;^)  -  £]r(t)  -  rT(r)[K(t)  -  £]TB*  (xp)P(t)e*(t) 

+  2eJ(r)P(xp)F(t) 

-  <£(f)[K(t)  -  ^]r(f)  +  rT(t)[K(r)  -  £]Tey(t) 

.  —  2eJ(t)ey(t)rT(t)rr(t) 

-2<rtr{[K/(t)-^]r-,[K/(t)-^]T} 

-2fftr{[X,(t)-^]r-1^T)  (A3°) 

Using  (1 1)-(13)  and  (A  15)  gives 

fy)  =  -eTx  (t)Q(x,)ex(t)  -  eJ(t)LT(xp)L(x,)ex(t) 

-  eI(t)Cje(xp)[K(t)  -  £]r(t)  -  rT(t)[K(t)  -  ^]TCjc(xp)e,(t) 
-eI(t)LT(xp)^(xp)[K(t)-/?]r(t) 

+  rT(r)[K(t)  -  £]TWT(xp)L(xp)e,(t) 

+  eJ(t)Cjc(xp)[K(t)-/?]r(t) 

-  rT(t)[K(t)  -  £]TZ>]*(xp)[K(t)  -  £]r(r) 

-  dftr)[/  +  Dp(xp)£,y] " 1  [K(t)  -  K]r(t ) 

+  rT(t)[K(t)-j?]TCpe(xp)ex(t) 

-  rT(t)[K(t)  -  £]tDp t(xp)[K(t)  -  X]r(t) 

-  rT(t)[K(t)-K]T[I  +  Dp(xp)K,y]~  ldt(t) 

-2  ej(t)e,(t)rr(t)rr(t) 

-  2 a  tr  {[K,(t)  [K,(t)  -  £]T} 

-2(7  tr{[/c,(t)-£]r-1£T) 

+  2el(t)P(xp)F(t) 

Rearranging  and  using  (13)  gives 

K(t)=  -el  (t)Q(xp)ex(t) 

-{L(xp)ex(t)-W(xp)lK(t)-KMt)\T 

x{L(xp)ex(t)-W(xPKK(t)-K-}r(t)} 

-2el(t)ey(t)rT(t)rr(t) 

-  2(7  tr  {[K,(t)  -  K]  r  ‘ 1  [K,(r)  -  £]T  1 
-2(7  tr{[K,(t)-K]r-‘KT} 

-2rT(t)lK(t)-K]rU  +  Dp(xp)Ktyr'd^ 

+  2eTx(t)P(xp)F(t) 


(A  32) 
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It  is  easy  to  see  that  some  positive  coefficients  a,-a10  exist  such  that 

II 6,(011 2  -  ||[£(t)  “  £]r«ll2  -  at3  IMOII4  -  lky(0ll2!M0ll2 
-*5IMf)ll2ll«m(0ll2-at6tT||/C,(0-K||2+a7ff||K/(0-^|| 

+  *«l[K(0- 1^1(01 +«9«e.WB  +  a«0B«»WI-|«,WI  (A  33) 

If  either  ||ex(t)||,  ||[/C(t)-£]r(r)||,  or  ||K,(t)-£||  increase  beyond  some  bound, 
the  negative  definite  quadratic  terms  in  (A  33)  become  dominant,  and  thus  P'(r)  be¬ 
comes  negative.  The  quadratic  form  of  the  Lyapunov  function  V(t)  then  guarantees 
that  all  the  dynamic  values,  namely  ex(r),  K,{t),  and  ey(t)  are  bounded.  In  idealistic 
situations  with  d,(t)  =  0  and  d0(t)  =  0.  F(t)  may  vanish  and  thus  allow  perfect  following 
(Bar-Kana  and  Kaufman  1985  a).  If,  however,  the  (7-term  were  missing  in  (27),  then 
the  adaptive  controller  could  not  have  avoided  those  situations  when  ||/C/(r)  —  iC|| 
might  increase  without  bound  in  spite  of  ||[K(t)-  £]r(t)||  being  small.  As  shown  in 
§  2,  the  properties  of  ASP  systems  guarantee  that  theoretically,  the  adaptive  system 
remains  stable,  even  if  the  gains  increase  without  bound.  Therefore,  on  one  hand  one 
can  use  very  high  adaptation  scaling  factors  such  that  the  weighting  of  the  negative 
terms  is  dominant  and  the  tracking  errors  become  very  (arbitrarily)  small.  However, 
the  gains  may  also  reach  unnecessarily  or  unrealistically  large  values.  The  negative 
quadratic  (7-term  in  (A  32)  shows  that  such  a  situation  is  not  possible  and  all  values 
are  therefore  bounded,  Moreover,  the  adaptive  gains  now  move  up  and  down  accord¬ 
ing  to  the  required  performance  (small  tracking  errors),  as  a  practical  implementation 
of  the  adaptive  control  aim  to  fitting  the  right  gains  to  the  right  operational  situation. 
This  way,  on  one  hand  one  can  have  smooth  tracking  with  small  adaptive  gains  (and 
low  cost  of  control  and  low  noise  amplification)  when  the  tracking  problem  is  easy. 
On  the  other  hand,  the  adaptive  gains  increase  to  any  large  desired  value  when 
required  by  the  transient  mode  or  the  desired  manoeuvrability,  so  that  the  tracking 
errors  remain  small.  This  property  of  the  adaptive  algorithm  may  give  it  an  advantage 
over  fixed  linear  controllers,  even  if  the  plants  (not  the  operational  environment)  were 
perfectly  known. 

As  already  mentioned,  only  the  Kty(t)  terms  are  needed  for  the  stability  of  the 
adaptive  control  system.  The  effect  of  the  other  terms,  like  the  proportional  adaptive 
gains  or  the  gains  multiplying  xm(t)  and  um(t)  is  expressed  by  the  corresponding 
supplementary  negative  terms  in  (A  32)  and  (A  33),  which  increase  the  rate  of  con¬ 
vergence  and  also  reduce  the  bounds  of  the  final  bounded  error  (where  k'(t)  is  not 
necessarily  negative)  thus  improving  the  performance  of  the  adaptive  controller. 
Notice  that  without  external  disturbances,  the  smaller  the  tracking  error,  the  closer 
xp(t)  is  to  the  ideal  trajectory  and,  in  turn,  the  smaller  the  residual  term  F(t).  Although 
in  general,  the  adaptive  controller  does  not  promise  perfect  tracking,  in  most  situ¬ 
ations  the  tracking  errors  are  very  small  and  sometimes  negligible.  At  the  price  of 
bounded  rather  than  vanishing  errors,  the  adaptive  controller  maintains  the  stability 
of  the  adaptive  control  system  in  most  practical  non-idealistic  and  ever  changing 
situations  (Bar-Kana  1988  c,  1989  d,  Bar-Kana  and  Kaufman  1988,  Morse  and  Oss- 
man  1989). 

Appendix  B 

Adaptive  control  of  strictly  causal  almost  passive  systems 

The  adaptive  controller  (37)-(42)  can  also  be  applied  to  strictly  causal  ASP  sys¬ 
tems,  that  satisfy  relations  (15),  (16). 
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The  proof  of  stability  in  such  a  case  follows  immediately  from  the  proof  of 
Appendix  A,  if  we  substitute  Dp(xp)  =  0,  L(xp)  =  0,  W(xp)  =  0. 

Then 

V(t)=  ~el(t)Q(xp)ex(t) 

-2eJ(t)ey{t)rT(t)rr(t) 

-2<rtr{[/Cf(t)-/C]r-1[/C,(t)-K]T} 

-1(5  tr{[K,(t)-£]r-l£T} 

-2rT(t)[K(t)-*]Td,(r) 

+  2eTx(t)P(xp)F(t)  (Bl) 

and  the  stability  results  following  (A  32)— (A  33)  are  also  perfectly  valid  here. 

Appendix  C 

Role  of  the  stabilizing  gain  K0 

Let  us  assume  that  we  used  the  fixed  gain  K0  and  control  the  plant 

xp(r)  =  /t0(xp)xp(t)  +  Bp(xp)up(t)  (C  1) 

ya(t)  =  Cp(xp)xp(r)  +  Dp(xp)xp(t)  (C  2) 

where 

A0(xp)  =  Ap(xp)  -  Bp(xp)K0Cp(xp)  (C  3) 

Since  (C  1)  is,  by  assumption,  uniformly  asymptotically  stable,  then  it  can  be  seen 
(Bar-Kana  1989  a)  that  there  exist  some  positive  definite  matrices  P0(xp)  and 
Q 0(xp),  and  some  matrices  L0(xp),  W0,  and  D0  such  that  the  following  relations  are 
satisfied: 

P„(xp)  +  P0(xp)A0(xp)  +  Al(xp)P0(xp)  =  -<2o(xP)  -  Ll(xp)L0(xp)  (C  4) 
P0(xp)Bp(xp)  =  Cj(xp)  -  L5(xp)  W0  (C  5) 

D0  +  Dl=WlW0  (C  6) 

Let  us  assume  that  no  external  input  command  or  disturbance  is  present,  and 
that  therefore  um(t)  =  0,  xm(f)  =  0,  ym(t)  =  0,  Kx(t)  =  0,  Ky(t)  =  0,  and  for  convenience, 
Kp(0  =  0,  K,(t)  =  Kty(t)  =  Ke>l{t). 

We  use  the  following  Lyapunov  function: 

K,(f)  =  xJ(r)P0(xp)xp(t)  (C  7) 

and  the  derivative  of  Ki(r)  along  the  trajectories  of  (C  1)-(C  2)  is 

•M0  =  -<pW[Po(^)  +  Po(xP)A0(xp)  +  Al(xp)P0(xp)]xp(t) 

+  xl(t)P0(xp)Bp{xp)up{t)  +  uJ(t)Bj(xp)P0(xp)xp(t) 


(C  8) 
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Substituting  uf(t)  from  (23)  and  using  (C  4)-(C  6)  gives 

^i(0  =  -Xf(t)Q0(xp)xp(t)  -  xrp(t)Ll(xp)L0(xp)xp(t) 

-  xl(t)C](x,)K,(t)ya(t)  -  yl(t)Kj(t)Cp(xp)xp(t) 

+  xltfLtixJWoKMyA)  +  y](t)Kj(t)WlL0(xp)xp{t)  (C  9) 

=  -x](t)Q0(xp)xp(t)-xl(t)Ll(xp)L0(xp)xp(t) 

+  zl(t)Ll(xp)  W0K,(t)yt(t)  +  yJ(r)Kf(t)  WlL0(xp)xp(t) 

-yl(t)KTmD0  +  Dl)K,(t)ya(t) 

-  fJ(f)KT(f)C£>PC^)  + 

-yJ(t)[K,M  +  Ac7(t)]ya(t)  (CIO) 

KW=  -xJ(t)Qo(xP)xp't) 

-  [L0(xp)xp(t)  -  tVo^(t)yM]T[L0(xp)xp(t)  -  WoKrttyM 
-yI(t)[Kt(t)  +  K!(t)]y.(t) 

+  yJ(t)Kj(t)[D0  +  Dl-  Dp(xp)  -  Dl(xp)-]K,(t)ya(t)  (C  1 1) 

Because  only  the  last  term  in  (C  1 1)  is  not  negative,  then  if  either  ||ya(r)||,  ||K(f)|| 
or  ||K(f)ya(f)ll  becomes  small,  the  adaptive  system  enters  the  domain  of  attraction 
of  the  equilibrium  point  (ya(r)  =  0,  K(t)  =  K0).  Because  boundness  of  all  values  is 
guaranteed  (Appendixes  A  and  B),  the  conclusions  that  follow  (30)  are  immediate. 

We  now  show  that  for  any  K0  >  KmJn,  the  point  (xp  =0,  K,(t)  =  0)  or  (xp  =  0, 
K(t)  -  K0)  is  also  the  only  equilibrium  poi:.  of  the  system,  if  we  restrict  the  discussion 
to  those  non-linear  systems  for  which  uniform  asymptotic  stability  guarantees  that 
the  system  matrix  is  non-singular. 

We  then  assume  that  before  applying  the  adaptive  controller,  we  used  some 


preliminary  constant  matrix  K0  such  that  the  system  represented  by 

A0(*p)  =  Ap{xp)  -  Bp(xp)K0Cp(xp)  (C  12) 

is  uniformly  asymptotically  stable. 

The  maximal  admissible  (supplementary)  feedback  gain  is  then 

o  (C  13) 

and  therefore 

Dp  =  [Kmtt-K0yl  (C 14) 

From  (C  1)  we  get 

xp(0  -  AK{xp)x„{t)  (C  15) 

where 

■4x(xp)  =  A0(xp)-  Bp(xp)Kc{t)Cp(xp)  (C  16) 

«,«  =  -  K,(t)ya(t )  =  -K,(t)Cp(xp)xp(t)  -  K,(t)Dpxp(t)  (C  17) 
up(t)=-Kc(t)Cp(xp)xp(t)  (C  18) 

A\(t)  =  [/  +  *,  (r)Dp]-%(r)  (C 19) 
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Form  (C  19)  it  can  be  seen  again  that 

0  <  K/(t)  <  oo  implies  0  <  Ke(t)  <  D~ 1  =  Kmtl  -  K0  ■  (C  20) 
The  equilibrium  points  of  (C  15)  and  (27)  are  obtained  from 

'M*p)*p(f)  =  o  (C21) 

<rK,(t)-ya(t)y!(t)r=o  (C  22) 


If  >  /v„|n,  then  /lK(xp)  is  stable  for  any  value  of  K((t)  and  it  is.  therefore,  non¬ 
singular.  Jn  this  case  (C  21)  has  only  the  unique  solution  xp(r)  =  0.  and  then,  from 
(C  22)  we  get  K,(t)  =  0. 

In  conclusion,  if  K0  >  Kmin ,  then  the  equilibrium  point  (xp(t)  =  0,  K,(t)  =  0)  or 
(xp(t)  =  0,  K(t)=  K0)  is  both  stable  (attractive)  and  unique. 
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